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Client Ref. Nos. CX-99-0045 and CX-99-0061 

FUSION PROTEINS OF MYCOBACTERIUM TUBERCULOSIS 

CROSS-REFERENCES TO RELATED APPLICATIONS 
The present application claims priority to U.S. patent application No. 
5 60/158,338, filed October 7, 1999, and U.S. application No. 60/158,425, filed October 7, 
1999, herein each incorporated by reference in its entirety. 

This application is also related to U.S. patent application No. 09/056,556, filed 
April 7, 1998; U.S. patent application No. 09/223,040, filed December 30, 1998; U.S. patent 
application No. 09/287,849, filed April 7, 1999; and published PCT application No. 
10 W099/51748, filed April 7, 1999 (PCT/US99/07717), herein each incorporated by reference 
in its entirety. 

STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER 
FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT 

15 Not applicable. 

BACKGROUND OF THE INVENTION 
Tuberculosis is a chronic infectious disease caused by infection with M. 
tuberculosis and other Mycobacterium species. It is a major disease in developing countries, 

20 as well as an increasing problem in developed areas of the world, with about 8 million new 
cases and 3 million deaths each year. Although the infection may be asymptomatic for a 
considerable period of time, the disease is most commonly manifested as an acute 
inflammation of the lungs, resulting in fever and a nonproductive cough. If untreated, serious 
complications and death typically result, 

25 Although tuberculosis can generally be controlled using extended antibiotic 

therapy, such treatment is not sufficient to prevent the spread of the disease. Infected 
individuals may be asymptomatic, but contagious, for some time. In addition, although 
compliance with the treatment regimen is critical, patient behavior is difficult to monitor. 
Some patients do not complete the course of treatment, which can lead to ineffective 

30 treatment and the development of drug resistance. 

In order to control the spread of tuberculosis, effective vaccination and 
accurate early diagnosis of the disease are of utmost importance. Currently, vaccination with 
live bacteria is the most efficient method for inducing protective immunity. The most 



common mycobacterium employed for this purpose is Bacillus Calmette-Guerin (BCG), an 
avirulent strain of M bovis. However, the safety and efficacy of BCG is a source of 
controversy and some countries, such as the United States, do not vaccinate the general 
public with this agent. 

5 Diagnosis of tuberculosis is commonly achieved using a skin test, which 

involves intradermal exposure to tuberculin PPD (protein-purified derivative). Antigen- 
specific T cell responses result in measurable induration at the injection site by 48-72 hours 
after injection, which indicates exposure to mycobacterial antigens. Sensitivity and 
specificity have, however, been a problem with this test, and individuals vaccinated with 

1 0 BCG cannot be distinguished from infected individuals. 

While macrophages have been shown to act as the principal effectors of 
Mycobacterium immunity, T cells are the predominant inducers of such immunity. The 
essential role of T cells in protection against Mycobacterium infection is illustrated by the 
frequent occurrence of Mycobacterium infection in AIDS patients, due to the depletion of 

15 CD4 + T cells associated with human immunodeficiency virus (HIV) infection. 

Mycobacterium-reactive CD4 + T cells have been shown to be potent producers of y-interferon 
(IFN-y), which, in turn, has been shown to trigger the anti-mycobacterial effects of 
macrophages in mice. While the role of IFN-y in humans is less clear, studies have shown 
that 1,25-dihydroxy-vitamin D3, either alone or in combination with IFN-y or tumor necrosis 

20 factor-alpha, activates human macrophages to inhibit M. tuberculosis infection. Furthermore, 
it is known that IFN-y stimulates human macrophages to make 1,25-dihydroxy-vitamin D3. 
Similarly, interleukin-12 (IL-12) has been shown to play a role in stimulating resistance to M 
tuberculosis infection. For a review of the immunology of M tuberculosis infection, see 
Chan & Kaufmann, Tuberculosis: Pathogenesis, Protection and Control (Bloom ed., 1994), 

25 and Harrison's Principles of Internal Medicine, volume 1, pp. 1004-1014 and 1019-1023 
(14th ed., Fauci et al., eds., 1998). 

Accordingly, there is a need for improved diagnostic reagents, and improved 
methods for diagnosis, preventing and treating tuberculosis. 

30 SUMMARY OF THE INVENTION 

The present invention provides pharmaceutical compositions comprising at 
least two heterologous antigens, fusion proteins comprising the antigens, and nucleic acids 
encoding the antigens, where the antigens are from a Mycobacterium species from the 
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tuberculosis complex and other Mycobacterium species that cause opportunistic infections in 
immune compromised patients. The present invention also relates to methods of using the 
polypeptides and polynucleotides in the diagnosis, treatment and prevention of 
Mycobacterium infection. 

5 The present invention is based, in part, on the inventors' discovery that fusion 

polynucleotides, fusion polypeptides, or compositions that contain at least two heterologous 
M tuberculosis coding sequences or antigens are highly antigenic and upon administration to 
a patient increase the sensitivity of tuberculosis sera. In addition, the compositions, fusion 
polypeptides and polynucleotides are useful as diagnostic tools in patients that may have been 

1 0 infected with Mycobacterium. 

In one aspect, the compositions, fusion polypeptides, and nucleic acids of the 
invention are used in in vitro and in vivo assays for detecting humoral antibodies or cell- 
mediated immunity against M. tuberculosis for diagnosis of infection or monitoring of 
disease progression. For example, the polypeptides may be used as an in vivo diagnostic 

15 agent in the form of an intradermal skin test. The polypeptides may also be used in in vitro 
tests such as an ELIS A with patient serum. Alternatively, the nucleic acids, the 
compositions, and the fusion polypeptides may be used to raise anti-M tuberculosis 
antibodies in a non-human animal. The antibodies can be used to detect the target antigens in 
vivo and in vitro. 

20 In another aspect, the compositions, fusion polypeptides and nucleic acids may 

be used as immunogens to generate or elicit a protective immune response in a patient. The 
isolated or purified polynucleotides are used to produce recombinant fusion polypeptide 
antigens in vitro, which are then administered as a vaccine. Alternatively, the 
polynucleotides may be administered directly into a subject as DNA vaccines to cause 

25 antigen expression in the subject, and the subsequent induction of an anti-M tuberculosis 
immune response. Thus, the isolated or purified M tuberculosis polypeptides and nucleic 
acids of the invention may be formulated as pharmaceutical compositions for administration 
to a subject in the prevention and/or treatment of M. tuberculosis infection. The 
immunogenicity of the fusion proteins or antigens may be enhanced by the inclusion of an 

30 adjuvant, as well as additional fusion polypeptides, from Mycobacterium or other organisms, 
such as bacterial, viral, mammalian polypeptides. Additional polypeptides may also be 
included in the compositions, either linked or unlinked to the fusion polypeptide or 
compositions. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 shows the nucleic acid sequence of a vector encoding TbF14 (SEQ 
ID NO:89). Nucleotides 5096 to 8594 encode TbF14 (SEQ ID NO:51). Nucleotides 5072 to 
5095 encode the eight amino acid His tag (SEQ ID NO:90); nucleotides 5096 to 7315 encode 
5 the MTb81 antigen (SEQ ID NO:l); and nucleotides 7316 to 8594 encode the Mo2 antigen 
(SEQ ID NO:3). 

Figure 2 shows the nucleic acid sequence of a vector encoding TbF15 (SEQ 
ID NO:91). Nucleotides 5096 to 8023 encode the TbF15 fusion protein (SEQ ID NO:53). 
Nucleotides 5072 to 5095 encode the eight amino acid His tag region (SEQ ID NO:90); 
10 nucleotides 5096 to 5293 encode the Ra3 antigen (SEQ ID NO:5); nucleotides 5294 to 6346 
encode the 38 kD antigen (SEQ ID NO;7); nucleotides 6347 to 6643 encode the 38-1 antigen 
(SEQ ID NO:9); and nucleotides 6644 to 8023 encode the FL TbH4 antigen (SEQ ID 
NO: 11). 

Figure 3 shows the amino acid sequence of TbF14 (SEQ ID NO:52), including 
ffl 15 the eight amino acid His tag at the N- terminus. 

Figure 4 shows the amino acid sequence of TbF15 (SEQ ID NO:54) 5 including 
the eight amino acid His tag at the N-terminus. 

Figure 5 shows ELISA results using fusion proteins of the invention. 
Figure 6 shows the nucleic acid and the predicted amino acid sequences of the 
2 20 entire open reading frame of HTCC#1 FL (SEQ ID NO: 13 and 14, respectively), 
p Figure 7 shows the nucleic acid and predicted amino acid sequences of three 

fragments of HTCC#1. (a) and (b) show the sequences of two overlapping fragments: an 
amino terminal half fragment (residues 1 to 223), comprising the first trans-membrane 
domain (a) and a carboxy terminal half fragment (residues 184 to 392), comprising the last 
25 two trans-membrane domains (b); (c) shows a truncated amino-terminal half fragment 
(residues 1 to 128) devoid of the trans-membrane domain. 

Figure 8 shows the nucleic acid and predicted amino acid sequences of a 
TbRal2-HTCC#l fusion protein (SEQ ID NO:63 and 64, respectively). 

Figure 9a shows the nucleic acid and predicted amino acid sequences of a 
30 recombinant HTCC#1 lacking the first trans-membrane domain (deleted of the amino acid 
residues 150 to 160). Figure 9b shows the nucleic acid and predicted amino acid sequences 
of 30 overlapping peptides of HTCC#1 used for the T-cell epitope mapping. Figure 9c 
illustrates the results of the T-cell epitope mapping of HTCC#1. Figure 9d shows the nucleic 
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acid and predicted amino acid sequences of a deletion construct of HTCC#1 lacking all the 
trans-membrane domains (deletion of amino acid residues 101 to 203). 

Figure 10 shows the nucleic acid and predicted amino acid sequences of the 
fusion protein HTCC#l(184-392)-TbH9-HTCC#l(l-129) (SEQ ID NO:57 and 58, 
5 respectively). 

Figure 1 1 shows the nucleic acid and predicted amino acid sequences of the 
fusion protein HTCC#l(l-149)-TbH9-HTCC#l(161-392) (SEQ ID NO:59 and 60, 
respectively). 

Figure 12 shows the nucleic acid and predicted amino acid sequences of the 
10 fusion protein HTCC#l(184-392)-TbH9-HTCC#l(l-200) (SEQ ID NO:61 and 62, 
respectively). 

Figure 13 shows the nucleotide sequence of Mycobacterium tuberculosis 
13 antigen MTb59 (SEQ ID NO:49). 

Hi Figure 14 shows the amino acid sequence of Mycobacterium tuberculosis 

00 15 antigen MTb59 (SEQ ID NO:50). 

if Figure 15 shows the nucleotide sequence of Mycobacterium tuberculosis 

m antigen MTb82 (SEQ ID NO:47). 

M= Figure 16 shows the amino acid sequence of Mycobacterium tuberculosis 

P antigen MTb82 (SEQ ID NO:48). 

^; 20 Figure 17 shows the amino acid sequence of Mycobacterium tuberculosis the 

O secreted form of antigen DPPD (SEQ ID NO:44). 

DESCRIPTION OF SEQUENCES 

SEQ ID NO:l is the nucleic acid sequence encoding the Mtb81 antigen. 
25 SEQ ID NO:2 is the amino acid sequence of the Mtb81 antigen. 

SEQ ID NO: 3 is the nucleic acid sequence encoding the Mo2 antigen. 

SEQ ID NO:4 is the amino acid sequence of the Mo2 antigen. 

SEQ ID NO: 5 is the nucleic acid sequence encoding the TbRa3 antigen. 

SEQ ID NO:6 is the amino acid sequence of the TbRa3 antigen. 
30 SEQ ID NO:7 is the nucleic acid sequence encoding the 38kD antigen. 

SEQ ID NO:8 is the amino acid sequence of the 38kD antigen. 

SEQ ID NO:9 is the nucleic acid sequence encoding the Tb38-1 antigen. 

SEQ ID NO: 10 is the amino acid sequence of the Tb38-1 antigen. 
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SEQ ID NO:l 1 is the nucleic acid sequence encoding the full-length (FL) 

TbH4 antigen. 

SEQ ID NO: 12 is the amino acid sequence of the FL TbH4 antigen. 

SEQ ID NO: 13 is the nucleic acid sequence encoding the HTCC#1 (Mtb40) 

5 antigen. 

SEQ ID NO: 14 is the amino acid sequence of the HTCC#1 antigen. 

SEQ ID NO: 15 is the nucleic acid sequence of an amino terminal half 
fragment (residues 1 to 223) of HTCC#1 ? comprising the first trans-membrane domain. 

SEQ ID NO: 16 is the predicted amino acid sequence of an amino terminal half 
1 0 fragment (residues 1 to 223) of HTCC#1 . 

SEQ ID NO: 17 is the nucleic acid sequence of a carboxy terminal half 
fragment (residues 184 to 392) of HTCC#1, comprising the last two trans-membrane 
domains. 

J SEQ ID NO: 1 8 is the predicted amino acid sequence of a carboxy terminal 
W 1 5 half fragment (residues 1 84 to 392) of HTCC#1 . 

IT'"!: 

Si SEQ ID NO: 19 is the nucleic acid sequence of a truncated amino-terminal half 

1 y fragment (residues 1 to 128) of HTCC#1 devoid of the trans-membrane domain, 

h* SEQ ID NO:20 is the predicted amino acid sequence of a truncated amino- 

y* terminal half fragment (residues 1 to 128) of HTCC#L 

y 20 SEQ ID NO:21 is the nucleic acid sequence of a recombinant HTCC#1 

C lacking the first trans-membrane domain (deleted of the amino acid residues 150 to 160). 

SEQ ED NO:22 is the predicted amino acid sequence of a recombinant 

HTCC#1 lacking the first trans-membrane domain (deleted of the amino acid residues 150 to 

160). 

25 SEQ ID NO:23 is the nucleic acid sequence of a deletion construct of 

HTCC#1 lacking all the trans-membrane domains (deletion of amino acid residues 101 to 
203). 

SEQ ID NO:24 is the predicted amino acid sequence of a deletion construct of 
HTCC#1 lacking all the trans-membrane domains (deletion of amino acid residues 101 to 
30 203). 

SEQ ID NO:25 is the nucleic acid sequence encoding the TbH9 (Mtb39A) 

antigen. 

SEQ ID NO:26 is the amino acid sequence of the TbH9 antigen. 

SEQ ID NO:27 is the nucleic acid sequence encoding the TbRal2 antigen. 



6 



SEQ ID NO:28 is the amino acid sequence of the TbRal2 antigen. 
SEQ ID NO:29 is the nucleic acid sequence encoding the TbRa35 (Mtb32A) 

antigen. 

SEQ ID NO:30 is the amino acid sequence of the TbRa35 antigen. 
SEQ ID NO:31 is the nucleic acid sequence encoding the MTCC#2 (Mtb41) 

antigen. 

SEQ ID NO: 32 is the amino acid sequence of the MTCC#2 antigen. 
SEQ ID NO:33 is the nucleic acid sequence encoding the MTI (Mtb9.9A) 

antigen. 

SEQ ID NO:34 is the amino acid sequence of the MTI antigen. 
SEQ ID NO:35 is the nucleic acid sequence encoding the MSL (Mtb9.8) 

antigen. 

SEQ ID NO:36 is the amino acid sequence of the MSL antigen. 
SEQ ID NO:37 is the nucleic acid sequence encoding the DPV (Mtb8.4) 

antigen. 

SEQ ID NO:38 is the amino acid sequence of the DPV antigen. 
SEQ ID NO:39 is the nucleic acid sequence encoding the DPEP antigen. 
SEQ ED NO:40 is the amino acid sequence of the DPEP antigen. 
SEQ ID NO:41 is the nucleic acid sequence encoding the Erdl4 (Mtbl6) 
20 antigen. 

SEQ ID NO:42 is the amino acid sequence of the Erdl4 antigen. 
SEQ ID NO:43 is the nucleic acid sequence encoding the DPPD antigen. 
SEQ ID NO:44 is the amino acid sequence of the DPPD antigen. 
SEQ ID NO:45 is the nucleic acid sequence encoding the ESAT-6 antigen. 
25 SEQ ID NO:46 is the amino acid sequence of the ESAT-6 antigen. 

SEQ ID NO:47 is the nucleic acid sequence encoding the Mtb82 (Mtb867) 

antigen. 

SEQ ID NO:48 is the amino acid sequence of the Mtb82 antigen. 

SEQ ID NO:49 is the nucleic acid sequence encoding the Mtb59 (Mtb403) 

30 antigen. 

SEQ ID NO: 50 is the amino acid sequence of the Mtb59 antigen. 
SEQ ID NO: 51 is the nucleic acid sequence encoding the TbF14 fusion 

protein. 

SEQ ID NO:52 is the amino acid sequence of the TbF14 fusion protein. 
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SEQ ID NO:53 is the nucleic acid sequence encoding the TbF15 fusion 

protein. 

SEQ ID NO:54 is the amino acid sequence of the TbF15 fusion protein. 
SEQ ID NO:55 is the nucleic acid sequence of the fusion protein 
HTCC#l(FL)-TbH9(FL). 

SEQ ID NO:56 is the amino acid sequence of the fusion protein HTCC#1(FL> 

TbH9(FL). 

SEQ ID NO:57 is the nucleic acid sequence of the fusion protein 
HTCC#l(184-392)-TbH9-HTCC#l(l-129). 

SEQ ID NO:58 is the predicted amino acid of the fusion protein 
HTCC#l(184-392)-TbH9-HTCC#l(l-129). 

SEQ ID NO:59 is the nucleic acid sequence of the fusion protein HTCC#1(1- 
149)-TbH9-HTCC#l(161-392). 

SEQ ID NO:60 is the predicted amino acid sequence of the fusion protein 
HTCC#l(l-149)-TbH9-HTCC#l(161-392). 

SEQ ID NO:61 is the nucleic acid sequence of the fusion protein 
HTCC#1 (1 84-392)-TbH9-HTCC#l (1 -200). 

SEQ ED NO:62 is the predicted amino acid sequence of the fusion protein 
HTCC#l(184-392)-TbH9-HTCC#l(l-200). 

SEQ ID NO:63 is the nucleic acid sequence of the TbRal2-HTCC#l fusion 

protein. 

SEQ ID NO:64 is the predicted amino acid sequence of the TbRal2-HTCC#l 

fusion protein. 

SEQ ID NO:65 is the nucleic acid sequence of the TbF (TbRa3, 38kD, Tb38- 
1) fusion protein. 

SEQ ID NO:66 is the predicted amino acid sequence of the TbF fusion 

protein. 

SEQ ID NO:67 is the nucleic acid sequence of the TbF2 (TbRa3, 38kD, Tb38 
1 ? DPEP) fusion protein. 

SEQ ID NO: 68 is the predicted amino acid sequence of the TbF2 fusion 

protein. 

SEQ ID NO:69 is the nucleic acid sequence of the TbF6 (TbRa3, 38kD 5 Tb38 
1, TbH4) fusion protein. 
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SEQ ID NO:70 is the predicted amino acid sequence of the TbF6 fusion 

protein. 

SEQ ID NO:71 is the nucleic acid sequence of the TbF8 (38kD-linker-DPEP) 

fusion protein. 

SEQ ID NO: 72 is the predicted amino acid sequence of the TbF8 fusion 

protein. 

SEQ ID NO:73 is the nucleic acid sequence of the Mtb36F (Erdl4-DPV-MTI) 

fusion protein. 

SEQ ED NO:74 is the predicted amino acid sequence of the Mtb36F fusion 

protein. 

SEQ ID NO:75 is the nucleic acid sequence of the Mtb88F (Erdl4-DPV-MTI- 
MSL-MTCC#2) fusion protein. 

SEQ ID NO:76 is the predicted amino acid sequence of the Mtb88F fusion 

protein. 

SEQ ID NO:77 is the nucleic acid sequence of the Mtb46F (Erdl4-DPV-MTI- 
MSL) fusion protein. 

SEQ ID NO:78 is the predicted amino acid sequence of the Mtb46F fusion 

protein. 

SEQ ID NO:79 is the nucleic acid sequence of the Mtb71F (DPV-MTI-MSL- 
MTCC#2) fusion protein. 

SEQ ID NO:80 is the predicted amino acid sequence of the Mtb71F fusion 

protein. 

SEQ ID NO:81 is the nucleic acid sequence of the Mtb31F (DPV-MTI-MSL) 

fusion protein. 

SEQ ID NO: 82 is the predicted amino acid sequence of the Mtb31F fusion 

protein. 

SEQ ID NO:83 is the nucleic acid sequence of the Mtb61F (TbH9-DPV-MTI) 

fusion protein. 

SEQ ID NO: 84 is the predicted amino acid sequence of the Mtb61F fusion 

protein. 

SEQ ID NO: 85 is the nucleic acid sequence of the Ral2-DPPD (Mtb24F) 

fusion protein. 

SEQ ID NO:86 is the predicted amino acid sequence of the Ral2-DPPD 

fusion protein. 



SEQ ID NO: 87 is the nucleic acid sequence of the Mtb72F (TbRal2-TbH9- 
TbRa35) fusion protein. 

SEQ ID NO: 88 is the predicted amino acid sequence of the Mtb72F fusion 

protein. 

SEQ ID NO:89 is the nucleic acid sequence of the Mtb59F (TbH9-TbRa35) 

fusion protein. 

SEQ ID NO:90 is the predicted amino acid sequence of the Mtb59F fusion 

protein. 

SEQ ID NO:91 is the nucleic acid sequence of a vector encoding TbF14. 

SEQ ID NO:92 is the nucleotide sequence of the region spanning nucleotides 
5072 to 5095 of SEQ ID NO:91 encoding the eight amino acid His tag. 

SEQ ID NO:93 is the nucleic acid sequence of a vector encoding TbF15. 

SEQ ID NO:94-123 are the nucleic acid sequences of 30 overlapping peptides 
of HTCC#1 used for the T-cell epitope mapping. 

SEQ ID NO:124-153 are the predicted amino acid sequences of 30 
overlapping peptides of HTCC#1 used for the T-cell epitope mapping. 

DETAILED DESCRIPTION OF THE INVENTION 

I. INTRODUCTION 

The present invention relates to compositions comprising antigen 
compositions and fusion polypeptides useful for the diagnosis and treatment of 
Mycobacterium infection, polynucleotides encoding such antigens, and methods for their use. 
The antigens of the present invention are polypeptides or fusion polypeptides of 
Mycobacterium antigens and immunogenic fragments thereof. More specifically, the 
compositions of the present invention comprise at least two heterologous polypeptides of a 
Mycobacterium species of the tuberculosis complex, e.g., a species such as M tuberculosis, 
M. bovis, or M. africanum, or a Mycobacterium species that is environmental or opportunistic 
and that causes opportunistic infections such as lung infections in immune compromised 
hosts {e.g., patients with AIDS), e.g.,BCG 9 M. avium, M. intracellular e, M. celatum, M. 
genavense, M. haemophilum, M. kansasii, M. simiae, M. vaccae, M. fortuitum, and M. 
scrofulaceum (see, e.g., Harrison 's Principles of Internal Medicine, volume 1, pp. 1004-1014 
and 1019-1023 (14 th ed., Fauci et al, eds., 1998). The inventors of the present application 
surprisingly discovered that compositions and fusion proteins comprising at least two 
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heterologous Mycobacterium antigens, or immunogenic fragments thereof, where highly 
antigenic. These compositions, fusion polypeptides, and the nucleic acids that encode them 
are therefore useful for eliciting protective response in patients, and for diagnostic 
applications. 

The antigens of the present invention may further comprise other components 
designed to enhance the antigenicity of the antigens or to improve these antigens in other 
aspects, for example, the isolation of these antigens through addition of a stretch of histidine 
residues at one end of the antigen. The compositions, fusion polypeptides, and nucleic acids 
of the invention can comprise additional copies of antigens, or additional heterologous 
polypeptides from Mycobacterium species, such as, e.g., MTb81, Mo2, TbRa3, 38 kD (with 
the N-terminal cysteine residue), Tb38-1, FL TbH4, HTCC#1, TbH9, MTCC#2, MTI, MSL, 
TbRa35, DPV, DPEP, Erdl4, TbRal2, DPPD, MTb82, MTb59, ESAT-6, MTB85 complex, 
or a-crystalline. Such fusion polypeptides are also referred to as polyproteins. The 
compositions, fusion polypeptides, and nucleic acids of the invention can also comprise 
additional polypeptides from other sources. For example, the compositions and fusion 
proteins of the invention can include polypeptides or nucleic acids encoding polypeptides, 
wherein the polypeptide enhances expression of the antigen, e.g., NS1, an influenza virus 
protein, or an immunogenic portion thereof (see, e.g., WO99/40188 and WO93/04175). The 
nucleic acids of the invention can be engineered based on codon preference in a species of 

choice, e.g., humans. 

The compositions of the invention can be naked DNA, or the compositions, 
e.g., polypeptides, can also comprise adjuvants such as, for example, AS2, AS2\ AS2", 
AS4, AS6, ENHANZYN (Detox), MPL, QS21, CWS, TDM, AGPs, CPG, Leif, saponin, and 
saponin mimetics, and derivatives thereof. 

In one aspect, the compositions and fusion proteins of the invention are 
composed of at least two antigens selected from the group consisting of an MTb81 antigen or 
an immunogenic fragment thereof from a Mycobacterium species of the tuberculosis 
complex, and an Mo2 antigen or an immunogenic fragment thereof from a, Mycobacterium 
species of the tuberculosis complex. In one embodiment, the compositions of the invention 
comprise the TbF14 fusion protein. The complete nucleotide sequence encoding TbF14 is set 
forth in SEQ ID NO:51, and the amino acid sequence of TbF14 is set forth in SEQ ID NO:52. 

In another aspect, the compositions and fusion proteins of the invention are 
composed of at least four antigens selected from the group consisting of a TbRa3 antigen or 
an immunogenic fragment thereof from a Mycobacterium species of the tuberculosis 
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complex, a 38 kD antigen or an immunogenic fragment thereof from ^Mycobacterium 
species of the tuberculosis complex, a Tb38-1 antigen or an immunogenic fragment thereof 
from a Mycobacterium species of the tuberculosis complex, and a FL TbH4 antigen or an 
immunogenic fragment thereof from a Mycobacterium species of the tuberculosis complex. 
5 In one embodiment, the compositions of the invention comprise the TbFl 5 fusion protein. 
The nucleic acid and amino acid sequences of TbF15 are set forth in SEQ ID NO:53 and 54, 
respectively. 

In another aspect, the compositions and fusion proteins of the invention are 
composed of at least two antigens selected from the group consisting of an HTCC#1 antigen 

10 or an immunogenic fragment thereof from a Mycobacterium species of the tuberculosis 

complex, and a TbH9 antigen or an immunogenic fragment thereof from a Mycobacterium 
species of the tuberculosis complex. In one embodiment, the compositions of the invention 
comprise the HTCC#l(FL)-TbH9(FL) fusion protein. The nucleic acid and amino acid 
sequences of HTCC#l-TbH9 are set forth in SEQ ID NO:55 and 56, respectively. In another 

1 5 embodiment, the compositions of the invention comprise the fusion protein HTCC#1 (1 84- 
392)/TbH9/HTCC#l(l-129). The nucleic acid and amino acid sequences of HTCC#1(184- 
392)/TbH9/HTCC#l(l-129) are set forth in SEQ ID NO:57 and 58, respectively. In yet 
another embodiment, the compositions of the invention comprise the fusion protein 
HTCC#l(l-149)/TbH9/HTCC#l(161-392), having the nucleic acid and amino acid 

20 sequences set forth in SEQ ID NO:59 and 60, respectively. In still another embodiment, the 
compositions of the invention comprise the fusion protein HTCC#1(184- 
392)/TbH9/HTCC#l(l-200), having the nucleic acid and amino acid sequences set forth in 
SEQ ID NO:61 and 62, respectively. 

In a different aspect, the compositions and fusion proteins of the invention are 

25 composed of at least two antigens selected from the group consisting of an HTCC#1 antigen 
or an immunogenic fragment thereof from a, Mycobacterium species of the tuberculosis 
complex, and a TbRal2 antigen or an immunogenic fragment thereof from a Mycobacterium 
species of the tuberculosis complex. In one embodiment, the compositions of the invention 
comprise the fusion protein TbRal2-HTCC#l. The nucleic acid and amino acid sequences of 

30 the TbRal2-HTCC#l fusion protein are set forth in SEQ ID NO:63 and 64, respectively. 

In yet another aspect, the compositions and fusion proteins of the invention are 
composed of at least two antigens selected from the group consisting of a TbH9 (MTB39) 
antigen or an immunogenic fragment thereof from a Mycobacterium species of the 
tuberculosis complex, and a TbRa35 (MTB32A) antigen or an immunogenic fragment thereof 
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from a Mycobacterium species of the tuberculosis complex. In one embodiment, the antigens 
are selected from the group consisting of a TbH9 (MTB39) antigen or an immunogenic 
fragment thereof from a Mycobacterium species of the tuberculosis complex, and a 
polypeptide comprising at least 205 amino acids of the N-terminus of a TbRa35 (MTB32A) 
antigen from a Mycobacterium species of the tuberculosis complex. In another embodiment, 
the antigens are selected from the group consisting of a TbH9 (MTB39) antigen or an 
immunogenic fragment thereof from a Mycobacterium species of the tuberculosis complex, a 
polypeptide comprising at least 205 amino acids of the N-terminus of a TbRa35 (MTB32A) 
antigen from a Mycobacterium species of the tuberculosis complex, and a polypeptide 
comprising at least about 132 amino acids from the C-terminus of a TbRa35 (MTB32A) 
antigen from & Mycobacterium species of the tuberculosis complex. 

In yet another embodiment, the compositions of the invention comprise the 
Mtb59F fusion protein. The nucleic acid and amino acid sequences of the Mtb59F fusion 
protein are set forth in SEQ ID NO:89 and 90, respectively, as well as in the U.S. patent 
application No. 09/287,849 and in the PCT/US99/07717 application. In another embodiment, 
the compositions of the invention comprise the Mtb72F fusion protein having the nucleic acid 
and amino acid sequences set forth in SEQ ID NO:87 and 88, respectively. The Mtb72F 
fusion protein is also disclosed in the U.S. patent application Nos. 09/223,040 and 
09/223,040; and in the PCT/US99/07717 application. 

In yet another aspect, the compositions and fusion proteins of the invention 
comprise at least two antigens selected from the group consisting of MTb81, Mo2, TbRa3, 
38kD, Tb38-1 (MTbll), FL TbH4, HTCC#1 (Mtb40), TbH9, MTCC#2 (Mtb41), DPEP, 
DPPD, TbRa35, TbRal2, MTb59, MTb82, Erdl4 (Mtbl6), FL TbRa35 (Mtb32A), DPV 
(Mtb8.4), MSL (Mtb9.8), MTI (Mtb9.9A, also known as MTI-A), ESAT-6, a-crystalline, and 
85 complex, or an immunogenic fragment thereof from a Mycobacterium species of the 
tuberculosis complex. 

In another aspect, the fusion proteins of the invention are: 
TbRa3-38 kD-Tb38-l (TbF), the sequence of which is disclosed in SEQ ID 
NO:65 (DNA) and SEQ ID NO:66 (protein), as well as in the U.S. patent application Nos. 
08/818,112; 08/818,111; and 09/056,556; and in the W098/16646 and W098/16645 
applications; 
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TbRa3-38kD-Tb38-l-DPEP (TbF2), the sequence of which is disclosed in 
SEQ ID NO-.67 (DNA) and SEQ ID NO:68 (protein), and in the U.S. patent application Nos. 
08/942,578; 08/942,341; 09/056,556; and in the W09 8/ 16646 and W098/16645 applications; 

TbRa3-38kD-Tb38-l-TBH4 (TbF6), the sequence of which is disclosed in 
5 SEQ ID NO:69 (DNA) and SEQ ID NO:70 (protein) in the U.S. patent application Nos. 
08/072,967; 09/072,596; and in the PCT/US 99/03268 and PCT/US99/03265 applications; 

38kD-Linker-DPEP (TbF8), the sequence of which is disclosed in SEQ ID 
NO:71 (DNA) and SEQ ID NO:72 (protein), and in the U.S. patent application Nos. 
09/072,967 and 09/072,596; as well as in the PCT/US99/03268 and PCT/US99/03265 
10 applications; 

Erdl4-DPV-MTI (MTb36F), the sequence of which is disclosed in SEQ ID 
NO:73 (DNA), SEQ ID NO:74 (protein), as well as in the U.S. patent application Nos. 
09/223,040 and No. 09/287,849; and in the PCT/US99/07717 application; 

Erdl4-DPV-MTI-MSL-MTCC#2 (MTb88f), the sequence of which is 
15 disclosed in SEQ ID NO:75 (cDNA) and SEQ ID NO:76 (protein), as well as in the U.S. 
patent application No. 09/287,849 and in the PCT/US 99/077 17 application; 

Erdl4-DPV-MTI-MSL (MTb46F), the sequence of which is disclosed in SEQ 
ID NO:77 (cDNA) and SEQ ID NO:78 (protein), and in the U.S. patent application No. 
09/287,849 and in the PCT/US99/07717 application; 
20 DPV-MTI-MSL-MTCC#2 (MTb7 1 F), the sequence of which is disclosed in 

SEQ ID NO-.79 (cDNA) and SEQ ID NO:80 (protein), as well as in the U.S. patent 
application No. 09/287,849 and in the PCT/US99/07717 application; 

DPV-MTI-MSL (MTb31F), the sequence of which is disclosed in SEQ ID 
NO:81 (cDNA) and SEQ ID NO:82 (protein), and in the U.S. patent application No. 
25 09/287,849 and in the PCT/US99/07717 application; 

TbH9-DPV-MTI (MTb61F), the sequence of which is disclosed in SEQ ID 
NO:83 (cDNA) and SEQ ID NO:84 (protein) (see, also, U.S. patent application No. 
09/287,849 and PCT/US 99/077 17 application); 

Ral2-DPPD (MTb24F), the sequence of which is disclosed in SEQ ID NO:85 
30 (cDNA) and SEQ ID NO:86 (protein), as well as in the U.S. patent application No. 
09/287,849 and in the PCT/US99/07717 application. 

In the nomenclature of the application, TbRa35 refers to the N-terminus of 
MTB32A (TbRa35FL), comprising at least about the first 205 amino acids of MTB32A from 
M. tuberculosis, or the corresponding region from another Mycobacterium species. TbRal2 



14 



refers to the C-terminus of MTB32A (TbRa35FL), comprising at least about the last 132 
amino acids from MTB32A from M. tuberculosis, or the corresponding region from another 
Mycobacterium species. 

The following provides sequences of some individual antigens used in the 
5 compositions and fusion proteins of the invention: 

Mtb81, the sequence of which is disclosed in SEQ ID NO:l (DNA) and SEQ 
ID NO:2 (predicted amino acid). 

Mo2, the sequence of which is disclosed in SEQ ID NO:3 (DNA) and SEQ ID 
NO:4 (predicted amino acid). 
10 Tb38-1 or 38-1 (MTbl 1), the sequence of which is disclosed in SEQ ID NO:9 

(DNA) and SEQ ID NO;10 (predicted amino acid), and is also disclosed in the U.S. patent 
application Nos. 09/072,96; 08/523,436; 08/523,435; 08/818,112; and 08/818,111; and in the 
WO97/09428 and WO97/09429 applications; 

TbRa3, the sequence of which is disclosed in SEQ ID NO:5 (DNA) and SEQ 
15 ID NO:6 (predicted amino acid sequence) (see, also, WO 97/09428 and WO97/09429 
applications); 

38 kD, the sequence of which is disclosed in SEQ ID NO:7 (DNA) and SEQ 
ID NO:8 (predicted amino acid sequence), as well as in the U.S. patent application No. 
09/072,967. 38 kD has two alternative forms, with and without the N- terminal cysteine 
20 residue; 

DPEP, the sequence of which is disclosed in SEQ ID NO:39 (DNA) and SEQ 
ID NO:40 (predicted amino acid sequence), and in the WO97/09428 and WO97/09429 
publications; 

TbH4, the sequence of which is disclosed as SEQ ID NO:l 1 (DNA) and SEQ 
25 ID NO: 12 (predicted amino acid sequence) (see, also, WO97/09428 and WO97/09429 
publications); 

Erdl4 (MTbl6), the cDNA and amino acids sequences of which are disclosed 
in SEQ ID NO:41 (DNA) and 42 (predicted amino acid), and in Verbon et ah, J. Bacteriology 
174:1352-1359 (1992); 

30 DPPD, the sequence of which is disclosed in SEQ ID NO:43 (DNA) and SEQ 

ID NO:44 (predicted amino acid sequence), and in the PCT/US99/03268 and 
PCT/US99/03265 applications. The secreted form of DPPD is shown herein in Figure 12; 
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MTb82 (MTb867), the sequence of which is disclosed in SEQ ID NO:47 
(DNA) and SEQ ID NO:48 (predicted amino acid sequence), and in Figures 8 (DNA) and 9 
(amino acid); 

MTb59 (MTb403) , the sequence of which is disclosed in SEQ ID NO:49 
5 (DNA) and SEQ ID NO:50 (predicted amino acid sequence), and in Figures 10 (DNA) and 
1 1 (amino acid); 

TbRa35 FL (MTB32A), the sequence of which is disclosed as SEQ ID NO:29 
(cDNA) and SEQ ID NO:30 (protein), and in the U.S. patent application Nos. 08/523,436, 
08/523,435; 08/658,800; 08/659,683; 08/818,112; 09/056,556; and 08/818,111; as well as in 

10 the WO97/09428 and WO97/09429 applications; see also Skeiky et al, Infection and 
Immunity 67:3998-4007 (1999); 

TbRal2, the C-terminus of MTB32A (TbRa35FL), comprising at least about 
the last 132 amino acids from MTB32A from M. tuberculosis, the sequence of which is 
disclosed as SEQ ID NO:27 (DNA) and SEQ ID NO:28 (predicted amino acid sequence) 

15 (see, also, U.S. patent application No. 09/072,967; and WO97/09428 and WO97/09429 
publications); 

TbRa35, the N-terminus of MTB32A (TbRa35FL), comprising at least about 

the first 205 amino acids of MTB32A from M tuberculosis, the nucleotide and amino acid 

sequence of which is disclosed in Figure 4; 
20 TbH9 (MTB39), the sequence of which is disclosed in SEQ ID NO:25 (cDNA 

full length) and SEQ ID NO:26 (protein full length), as well as in the U.S. patent application 

Nos. 08/658,800; 08/659,683; 08/818,112; 08/818,111; and 09/056,559; and in the 

WO97/09428 and WO97/09429 applications. 

HTCC#1 (MTB40), the sequence of which is disclosed in SEQ ID NO:13 
25 (DNA) and SEQ ID NO: 14 (amino acid), as well as in the U.S. patent application Nos, 

09/073,010; and 09/073,009; and in the PCT/US98/10407 and PCT/US98/10514 

applications; 

MTCC#2 (MTB41), the sequence of which is disclosed in SEQ ID NO:31 
(DNA) and SEQ ID NO:32 (amino acid), as well as in the U.S. patent application Nos. 
30 09/073,010; and 09/073,009; and in the WO98/53075 and WO98/53076 publications; 

MTI (Mtb9.9A), the sequence of which is disclosed in SEQ ID NO:33 (DNA) 
and SEQ ID NO:34 (amino acid), as well as in the U.S. patent application Nos. 09/073,010; 
and 09/073,009; and in the WO98/53075 and WO98/53076 publications; 
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MSL (Mtb9.8), the sequence of which is disclosed in SEQ ID NO:35 (DNA) 
and SEQ ID NO:36 (amino acid), as well as in the U.S. patent application Nos. 09/073,010; 
and 09/073,009; and in the WO98/53075 and WO98/53076 publications; 

DPV (Mtb8.4), the sequence of which is disclosed in SEQ ID NO:37 (DNA) 
5 and SEQ ID NO:38 (amino acid), and in the U.S. patent application Nos. 08/658,800; 
08/659,683; 08/818,111; 08/818,112; as well as in the WO97/09428 and WO97/09429 
publications; 

ESAT-6 (Mtb8.4), the sequence of which is disclosed in SEQ ID NO:45 
(DNA) and SEQ ID NO:46 (amino acid), and in the U.S. patent application Nos. 08/658,800; 
10 08/659,683; 08/818,1 11; 08/818,1 12; as well as in the WO97/09428 and WO97/09429 
publications; 

The following provides sequences of some additional antigens used in the 
compositions and fusion proteins of the invention: 

a-crystalline antigen, the sequence of which is disclosed in Verbon et al, J. 
15 BacL 174:1352-1359 (1992); 

85 complex antigen, the sequence of which is disclosed in Content et al, 
Infect & Immunol 59:3205-3212 (1991). 

Each of the above sequences is also disclosed in Cole et al. Nature 393:537 
(1998) and can be found at, e.g., http://www.sanger.ac.uk and http:/www.pasteur.fr/mycdb/. 
20 The above sequences are disclosed in U.S. patent applications Nos. 

08/523,435; 08/523,436; 08/658,800; 08/659,683; 08/818,111; 08/818,112; 08/942,341; 
08/942,578; 08/858,998; 08/859,381; 09/056,556; 09/072,596; 09/072,967; 09/073,009; 
09/073,010; 09/223,040; 09/287,849; and in PCT patent applications PCT/US99/03265, 
PCT/US99/03268; PCT/US99/07717; WO97/09428; WO97/09429; W098/16645; 
25 W098/16646; WO98/53075; and WO98/53076, each of which is herein incorporated by 
reference. 

The antigens described herein include polymorphic variants and 
conservatively modified variations, as well as inter-strain and interspecies Mycobacterium 
homologs. In addition, the antigens described herein include subsequences or truncated 
30 sequences. The fusion proteins may also contain additional polypeptides, optionally 
heterologous peptides from Mycobacterium or other sources. These antigens may be 
modified, for example, by adding linker peptide sequences as described below. These linker 
peptides may be inserted between one or more polypeptides which make up each of the 
fusion proteins. 
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II. DEFINITIONS 

"Fusion polypeptide" or "fusion protein" refers to a protein having at least two 
heterologous Mycobacterium sp. polypeptides covalently linked, either directly or via an 
amino acid linker. The polypeptides forming the fusion protein are typically linked C- 
5 terminus to N-terminus, although they can also be linked C-terminus to C-terminus, N- 

terminus to N-terminus, or N-terminus to C-terminus. The polypeptides of the fusion protein 
can be in any order. This term also refers to conservatively modified variants, polymorphic 
variants, alleles, mutants, subsequences, and interspecies homologs of the antigens that make 
up the fusion protein. Mycobacterium tuberculosis antigens are described in Cole et ah, 

10 Nature 393:537 (1998), which discloses the entire Mycobacterium tuberculosis genome. The 
complete sequence of Mycobacterium tuberculosis can also be found at 
http://www.sanger.ac.uk and at http://www.pasteur.fr/mycdb/ (MycDB). Antigens from other 
Mycobacterium species that correspond to M. tuberculosis antigens can be identified, e.g., 
using sequence comparison algorithms, as described herein, or other methods known to those 

15 of skill in the art, e.g. , hybridization assays and antibody binding assays. 

The term " TbF14" refers to a fusion protein having at least two antigenic, 
heterologous polypeptides from Mycobacterium fused together. The two peptides are 
referred to as MTb81 and Mo2. This term also refers to a fusion protein having 
polymorphic variants, alleles, mutants, fragments, and interspecies homologs of MTb81 

20 and Mo2. A nucleic acid encoding TbF14 specifically hybridizes under highly stringent 
hybridization conditions to SEQ ID NO:l and 3, which individually encode the MTb81 and 
Mo2 antigens, respectively, and alleles, polymorphic variants, interspecies homologs, 
subsequences, and conservatively modified variants thereof. A TbF14 fusion polypeptide 
specifically binds to antibodies raised against MTb81 and Mo2, and alleles, polymorphic 

25 variants, interspecies homologs, subsequences, and conservatively modified variants thereof 
(optionally including an amino acid linker). The antibodies are polyclonal or monoclonal. 
Optionally, the TbF14 fusion polypeptide specifically binds to antibodies raised against the 
fusion junction of MTb81 and Mo2, which antibodies do not bind to MTb81 or Mo2 
individually, i.e. , when they are not part of a fusion protein. The individual polypeptides 

30 of the fusion protein can be in any order. In some embodiments, the individual 

polypeptides are in order (N- to C- terminus) from large to small. Large antigens are 
approximately 30 to 150 kD in size, medium antigens are approximately 10 to 30 kD in 
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size, and small antigens are approximately less than 10 kD in size. The sequence encoding 
the individual polypeptide may be, e.g., a fragment such as an individual CTL epitope 
encoding about 8 to 9 amino acids. The fragment may also include multiple epitopes. The 
fragment may also represent a larger part of the antigen sequence, e.g. , about 50% or more 
5 of MTb81 andMo2. 

TbF14 optionally comprises additional polypeptides, optionally heterologous 
polypeptides, fused to MTb81 and Mo2, optionally derived from Mycobacterium as well as 
other sources, such as viral, bacterial, eukaryotic, invertebrate, vertebrate, and mammalian 
sources. As described herein, the fusion protein can also be linked to other molecules, 

1 0 including additional polypeptides . 

The term "TbF15" refers to a fusion protein having at least four antigenic, 
heterologous polypeptides from Mycobacterium fused together. The four peptides are 
referred to as TbRa3, 38 kD, Tb38-1 (with the N-terminal cysteine), and FL TbH4. This 
term also refers to a fusion protein having polymorphic variants, alleles, mutants, and 

1 5 interspecies homologs of TbRa3, 38 kD, Tb38-1 , and FL TbH4. A nucleic acid encoding 
TbF15 specifically hybridizes under highly stringent hybridization conditions to SEQ ID 
NO:5, 7, 9 and 11, individually encoding TbRa3, 38 kD, Tb38-1 and FL TbH4, 
respectively, and alleles, fragments, polymorphic variants, interspecies homologs, 
subsequences, and conservatively modified variants thereof. A TbF15 fusion polypeptide 

20 specifically binds to antibodies raised against TbRa3, 38 kD, Tb38-1, and FL TbH4 and 
alleles, polymorphic variants, interspecies homologs, subsequences, and conservatively 
modified variants thereof (optionally including an amino acid linker). The antibodies are 
polyclonal or monoclonal. Optionally, the TbF15 fusion polypeptide specifically binds to 
antibodies raised against the fusion junction of TbRa3, 38 kD, Tb38-1, and FL TbH4, which 

25 antibodies do not bind to TbRa3, 38 kD, Tb38-1, and FL TbH4 individually, i.e., when they 
are not part of a fusion protein. The polypeptides of the fusion protein can be in any order. 
In some embodiments, the individual polypeptides are in order (N- to C- terminus) from 
large to small. Large antigens are approximately 30 to 150 kD in size, medium antigens 
are approximately 10 to 30 kD in size, and small antigens are approximately less than 10 

30 kD in size. The sequence encoding the individual polypeptide may be as small as, e.g. , a 
fragment such as an individual CTL epitope encoding about 8 to 9 amino acids. The 
fragment may also include multiple epitopes. The fragment may also represent a larger 
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part of the antigen sequence, e.g. , about 50% or more of TbRa3, 38 kD, Tb38-1, and FL 
TbH4. 

TbF15 optionally comprises additional polypeptides, optionally heterologous 
polypeptides, fused to TbRa3, 38 kD, Tb38-1, and FL TbH4, optionally derived from 
5 Mycobacterium as well as other sources such as viral, bacterial, eukaryotic, invertebrate, 
vertebrate, and mammalian sources. As described herein, the fusion protein can also be 
linked to other molecules, including additional polypeptides. The compositions of the 
invention can also comprise additional polypeptides that are unlinked to the fusion proteins of 
the invention. These additional polypeptides may be heterologous or homologous 
10 polypeptides. 

The e< HTCC#l(FL)-TbH9(FL)," 4t HTCC#l(184-392)/TbH9/HTCC#l(l- 
129)," << HTCC#l(l-149)/TbH9/HTCC#l(161-392) 5 " and "HTCC#1(184- 
392)/TbH9/HTCC#l (1-200)" fusion proteins refer to fusion proteins comprising at least two 
antigenic, heterologous polypeptides from Mycobacterium fused together. The two peptides 

15 are referred to as HTCC#1 and TbH9. This term also refers to fusion proteins having 

polymorphic variants, alleles, mutants, and interspecies homologs of HTCC#1 and TbH9. 
A nucleic acid encoding HTCC#l-TbH9, HTCC#l(184-392)/TbH9/HTCC#l (1-129), 
HTCC#l(l-149)/TbH9/HTCC#l(161-392), or HTCC#l(184-392)/TbH9/HTCC#l(l-200) 
specifically hybridizes under highly stringent hybridization conditions to SEQ ID NO: 13 

20 and 25, individually encoding HTCC#1 and TbH9, respectively, and alleles, fragments, 
polymorphic variants, interspecies homologs, subsequences, and conservatively modified 
variants thereof. A HTCC#l(FL)-TbH9(FL), HTCC#l(184-392)/TbH9/HTCC#l(l-129), 
HTCC#l(l-149)/TbH9/HTCC#l(161-392), or HTCC#1 (184-3 92)/TbH9/HTCC#l (1-200) 
fusion polypeptide specifically binds to antibodies raised against HTCC#1 and TbH9, and 

25 alleles, polymorphic variants, interspecies homologs, subsequences, and conservatively 
modified variants thereof (optionally including an amino acid linker). The antibodies are 
polyclonal or monoclonal. Optionally, the HTCC#l(FL)-TbH9(FL), HTCC#1(184- 
392)/TbH9/HTCC#l(l-129), HTCC#l(l-149)/TbH9/HTCC#l(161-392), or HTCC#1(184- 
392)/TbH9/HTCC#l(l-200) fusion polypeptide specifically binds to antibodies raised 

30 against the fusion junction of the antigens, which antibodies do not bind to the antigens 

individually, i.e., when they are not part of a fusion protein. The polypeptides of the fusion 
protein can be in any order. In some embodiments, the individual polypeptides are in order 
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(N- to C- terminus) from large to small. Large antigens are approximately 30 to 150 kD in 
size, medium antigens are approximately 10 to 30 kD in size, and small antigens are 
approximately less than 10 kD in size. The sequence encoding the individual polypeptide 
may be as small as, e.g. , a fragment such as an individual CTL epitope encoding about 8 to 
5 9 amino acids. The fragment may also include multiple epitopes. The fragment may also 
represent a larger part of the antigen sequence, e.g. , about 50% or more (e.g. , full-length) 
of HTCC#1 andTbH9. 

HTCC#l(FL)-TbH9(FL), HTCC#l(184-392)/TbH9/HTCC#l(l-129), 
HTCC#l(l-149)/TbH9/HTCC#l(161-392),and HTCC#l(184-392)/TbH9/HTCC#l(l-200) 

10 optionally comprise additional polypeptides, optionally heterologous polypeptides, fused to 
HTCC#1 and TbH9, optionally derived from Mycobacterium as well as other sources such 
as viral, bacterial, eukaryotic, invertebrate, vertebrate, and mammalian sources. As 
described herein, the fusion protein can also be linked to other molecules, including 
additional polypeptides. The compositions of the invention can also comprise additional 

15 polypeptides that are unlinked to the fusion proteins of the invention. These additional 
polypeptides may be heterologous or homologous polypeptides. 

The term "TbRal2-HTCC#l" refers to a fusion protein having at least two 
antigenic, heterologous polypeptides from Mycobacterium fused together. The two peptides 
are referred to as TbRal2 and HTCC#1 . This term also refers to a fusion protein having 

20 polymorphic variants, alleles, mutants, and interspecies homologs of TbRal2 and HTCC#1 . 
A nucleic acid encoding "TbRal2-HTCC#l" specifically hybridizes under highly stringent 
hybridization conditions to SEQ ID NO: 27 and 13, individually encoding TbRal2 and 
HTCC#1, respectively, and alleles, fragments, polymorphic variants, interspecies 
homologs, subsequences, and conservatively modified variants thereof. A "TbRal2- 

25 HTCC#1" fusion polypeptide specifically binds to antibodies raised against TbRal2 and 
HTCC#1 and alleles, polymorphic variants, interspecies homologs, subsequences, and 
conservatively modified variants thereof (optionally including an amino acid linker). The 
antibodies are polyclonal or monoclonal. Optionally, the "TbRal2-HTCC#l" fusion 
polypeptide specifically binds to antibodies raised against the fusion junction of TbRal2 and 

30 HTCC#1 , which antibodies do not bind to TbRal2 and HTCC#1 individually, i.e. , when 
they are not part of a fusion protein. The polypeptides of the fusion protein can be in any 
order. In some embodiments, the individual polypeptides are in order (N- to C- terminus) 
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from large to small. Large antigens are approximately 30 to 150 kD in size, medium 
antigens are approximately 10 to 30 kD in size, and small antigens are approximately less 
than 10 kD in size. The sequence encoding the individual polypeptide may be as small as, 
e.g. , a fragment such as an individual CTL epitope encoding about 8 to 9 amino acids. The 
5 fragment may also include multiple epitopes. The fragment may also represent a larger 
part of the antigen sequence, e.g. , about 50% or more of TbRal2 and HTCC#1. 

"TbRal2-HTCC#l" optionally comprises additional polypeptides, optionally 
heterologous polypeptides, fused to TbRal2 andHTCOl, optionally derived from 
Mycobacterium as well as other sources such as viral, bacterial, eukaryotic, invertebrate, 
10 vertebrate, and mammalian sources. As described herein, the fusion protein can also be 
linked to other molecules, including additional polypeptides. The compositions of the 
invention can also comprise additional polypeptides that are unlinked to the fusion proteins of 
the invention. These additional polypeptides may be heterologous or homologous 
polypeptides. 

1 5 The term "Mtb72F" and "Mtb59F" refer to fusion proteins of the invention 

which hybridize under stringent conditions to at least two nucleotide sequences set forth in 
SEQ ID NO:25 and 29, individually encoding the TbH9 (MTB39) and Ra35 (MTB32A) 
antigens. The polynucleotide sequences encoding the individual antigens of the fusion 
polypeptides therefore include conservatively modified variants, polymorphic variants, 

20 alleles, mutants, subsequences, and interspecies homologs of TbH9 (MTB39) and Ra35 

(MTB32A). The polynucleotide sequence encoding the individual polypeptides of the fusion 
proteins can be in any order. In some embodiments, the individual polypeptides are in order 
(N- to C- terminus) from large to small. Large antigens are approximately 30 to 150 kD in 
size, medium antigens are approximately 10 to 30 kD in size, and small antigens are 

25 approximately less than 10 kD in size. The sequence encoding the individual polypeptide 
may be as small as, e.g., a fragment such as an individual CTL epitope encoding about 8 to 9 
amino acids. The fragment may also include multiple epitopes. The fragment may also 
represent a larger part of the antigen sequence, e.g., about 50% or more of TbH9 (MTB39) 
and Ra35 (MTB32A), e.g., the N- and C-terminal portions of Ra35 (MTB32A). 

30 An "Mtb72F" or "Mtb59F" fusion polypeptide of the invention specifically 

binds to antibodies raised against at least two antigen polypeptides, wherein each antigen 
polypeptide is selected from the group consisting of TbH9 (MTB39) and Ra35 (MTB32A). 
The antibodies can be polyclonal or monoclonal. Optionally, the fusion polypeptide 
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specifically binds to antibodies raised against the fusion junction of the antigens, which 
antibodies do not bind to the antigens individually, i.e., when they are not part of a fusion 
protein. The fusion polypeptides optionally comprise additional polypeptides, e.g., three, 
four, five, six, or more polypeptides, up to about 25 polypeptides, optionally heterologous 
5 polypeptides or repeated homologous polypeptides, fused to the at least two heterologous 
antigens. The additional polypeptides of the fusion protein are optionally derived from 
Mycobacterium as well as other sources, such as other bacterial, viral, or invertebrate, 
vertebrate, or mammalian sources. The individual polypeptides of the fusion protein can be 
in any order. As described herein, the fusion protein can also be linked to other molecules, 

10 including additional polypeptides. The compositions of the invention can also comprise 
additional polypeptides that are unlinked to the fusion proteins of the invention. These 
additional polypeptides may be heterologous or homologous polypeptides. 

A polynucleotide sequence comprising a fusion protein of the invention 
hybridizes under stringent conditions to at least two nucleotide sequences, each encoding an 

15 antigen polypeptide selected from the group consisting of Mtb81, Mo2, TbRa3, 38 kD, Tb38- 
1, TbH4, HTCC#1, TbH9, MTCC#2, MTI, MSL, TbRa35, DPV, DPEP, Erdl4, TbRal2, 
DPPD, ESAT-6, MTb82, MTb59, Mtb85 complex, and a-crystalline. The polynucleotide 
sequences encoding the individual antigens of the fusion polypeptide therefore include 
conservatively modified variants, polymorphic variants, alleles, mutants, subsequences, and 

20 interspecies homologs of Mtb81, Mo2, TbRa3, 38 kD, Tb38-1, TbH4, HTCC#1, TbH9, 
MTCC#2, MTI, MSL, TbRa35, DPV, DPEP, Erdl4, TbRal2, DPPD, ESAT-6, MTb82, 
MTb59, Mtb85 complex, and a-crystalline. The polynucleotide sequence encoding the 
individual polypeptides of the fusion protein can be in any order. In some embodiments, 
the individual polypeptides are in order (N- to C- terminus) from large to small. Large 

25 antigens are approximately 30 to 150 kD in size, medium antigens are approximately 10 to 
30 kD in size, and small antigens are approximately less than 10 kD in size. The sequence 
encoding the individual polypeptide may be as small as, e.g., a fragment such as an 
individual CTL epitope encoding about 8 to 9 amino acids. The fragment may also include 
multiple epitopes. The fragment may also represent a larger part of the antigen sequence, 

30 e.g. , about 50% or more of Mtb81, Mo2, TbRa3, 38 kD, Tb38-1, TbH4, HTCC#1, TbH9, 
MTCC#2, MTI, MSL, TbRa35, DPV, DPEP, Erdl4, TbRal2, DPPD, ESAT-6, MTb82, 
MTb59, Mtb85 complex, and a-crystalline. 
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A fusion polypeptide of the invention specifically binds to antibodies raised 
against at least two antigen polypeptides, wherein each antigen polypeptide is selected from 
the group consisting of Mtb81, Mo2, TbRa3, 38 kD, Tb38-1, TbH4, HTCC#1, TbH9, 
MTCC#2, MTI, MSL, TbRa35, DPV, DPEP, Erdl4, TbRal2, DPPD, ESAT-6, MTb82, 
5 MTb59, Mtb85 complex, and a-crystalline. The antibodies can be polyclonal or monoclonal. 
Optionally, the fusion polypeptide specifically binds to antibodies raised against the fusion 
junction of the antigens, which antibodies do not bind to the antigens individually, i.e., 
when they are not part of a fusion protein. The fusion polypeptides optionally comprise 
additional polypeptides, e.g., three, four, five, six, or more polypeptides, up to about 25 

10 polypeptides, optionally heterologous polypeptides or repeated homologous polypeptides, 
fused to the at least two heterologous antigens. The additional polypeptides of the fusion 
protein are optionally derived from Mycobacterium as well as other sources, such as other 
bacterial, viral, or invertebrate, vertebrate, or mammalian sources. The individual 
polypeptides of the fusion protein can be in any order. As described herein, the fusion 

15 protein can also be linked to other molecules, including additional polypeptides. The 

compositions of the invention can also comprise additional polypeptides that are unlinked to 
the fusion proteins of the invention. These additional polypeptides may be heterologous or 
homologous polypeptides. 

The term "fused" refers to the covalent linkage between two polypeptides in a 

20 fusion protein. The polypeptides are typically joined via a peptide bond, either directly to 

each other or via an amino acid linker. Optionally, the peptides can be joined via non-peptide 
covalent linkages known to those of skill in the art. 

"FL" refers to full-length, i.e., a polypeptide that is the same length as the 
wild-type polypeptide. 

25 The term "immunogenic fragment thereof refers to a polypeptide comprising 

an epitope that is recognized by cytotoxic T lymphocytes, helper T lymphocytes or B cells. 

The term "Mycobacterium species of the tuberculosis complex" includes those 
species traditionally considered as causing the disease tuberculosis, as well as Mycobacterium 
environmental and opportunistic species that cause tuberculosis and lung disease in immune 

30 compromised patients, such as patients with AIDS, e.g. , M tuberculosis, M. bovis, or M 

africanum, BCG, M. avium, M. intracellulare, M. celatum, M. genavense, M. haemophilum, 
M. kansasii, M. simiae, M. vaccae, M.fortuitum, andM scrofulaceum (see, e.g., Harrison's 
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Principles of Internal Medicine, volume 1, pp. 1004-1014 and 1019-1023 (14 th ed., Fauci et 
al, eds., 1998). 

An adjuvant refers to the components in a vaccine or therapeutic composition 
that increase the specific immune response to the antigen {see, e.g., Edelman, AIDS Res. Hum 
5 Retroviruses 8:1409-141 1 (1992)). Adjuvants induce immune responses of the Thl-type and 
Th-2 type response. Thl-type cytokines (e.g., IFN-y, IL-2, and IL-12) tend to favor the 
induction of cell-mediated immune response to an administered antigen, while Th-2 type 
cytokines (e.g., IL-4, IL-5, IL-6, IL-10 and TNF-p) tend to favor the induction of humoral 
immune responses. 

10 "Nucleic acid" refers to deoxyribonucleo tides or ribonucleotides and polymers 

thereof in either single- or double-stranded form. The term encompasses nucleic acids 
containing known nucleotide analogs or modified backbone residues or linkages, which are 
synthetic, naturally occurring, and non-naturally occurring, which have similar binding 
properties as the reference nucleic acid, and which are metabolized in a manner similar to the 

1 5 reference nucleotides. Examples of such analogs include, without limitation, 

phosphorothioates, phosphoramidates, methyl phosphonates, chiral-methyl phosphonates, 2- 
O-methyl ribonucleotides, peptide-nucleic acids (PNAs). 

Unless otherwise indicated, a particular nucleic acid sequence also implicitly 
encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions) 

20 and complementary sequences, as well as the sequence explicitly indicated. Specifically, 
degenerate codon substitutions may be achieved by generating sequences in which the third 
position of one or more selected (or all) codons is substituted with mixed-base and/or 
deoxyinosine residues (Batzer et aL, Nucleic Acid Res. 19:5081 (1991); Ohtsuka et ah, J. 
Biol Chem. 260:2605-2608 (1985); Rossolini etal, Mol Cell Probes 8:91-98 (1994)). The 

25 term nucleic acid is used interchangeably with gene, cDNA, mRNA, oligonucleotide, and 
polynucleotide. 

The terms "polypeptide," "peptide" and "protein" are used interchangeably 
herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers 
in which one or more amino acid residue is an artificial chemical mimetic of a corresponding 
30 naturally occurring amino acid, as well as to naturally occurring amino acid polymers and 
non-naturally occurring amino acid polymer. 

The term "amino acid" refers to naturally occurring and synthetic amino acids, 
as well as amino acid analogs and amino acid mimetics that function in a manner similar to 

25 



the naturally occurring amino acids. Naturally occurring amino acids are those encoded by 
the genetic code, as well as those amino acids that are later modified, e.g., hydroxy proline, y- 
carboxyglutamate, and O-phosphoserine. Amino acid analogs refers to compounds that have 
the same basic chemical structure as a naturally occurring amino acid, i.e., an a carbon that is 
5 bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g. , homoserine, 
norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs have modified 
R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic chemical 
structure as a naturally occurring amino acid, Amino acid mimetics refers to chemical 
compounds that have a structure that is different from the general chemical structure of an 
10 amino acid, but that functions in a manner similar to a naturally occurring amino acid. 

Amino acids may be referred to herein by either their commonly known three 
f% letter symbols or by the one-letter symbols recommended by the IUP AC-IUB Biochemical 

~f Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly 

EG accepted single-letter codes. 

J 15 "Conservatively modified variants" applies to both amino acid and nucleic 

2i acid sequences. With respect to particular nucleic acid sequences, conservatively modified 

h variants refers to those nucleic acids which encode identical or essentially identical amino 

p% acid sequences, or where the nucleic acid does not encode an amino acid sequence, to 

[J essentially identical sequences. Because of the degeneracy of the genetic code, a large 

□ 20 number of functionally identical nucleic acids encode any given protein. For instance, the 
™ codons GCA, GCC, GCG and GCU all encode the amino acid alanine. Thus, at every 

position where an alanine is specified by a codon, the codon can be altered to any of the 
corresponding codons described without altering the encoded polypeptide. Such nucleic acid 
variations are "silent variations," which are one species of conservatively modified 
25 variations. Every nucleic acid sequence herein which encodes a polypeptide also describes 
every possible silent variation of the nucleic acid. One of skill will recognize that each codon 
in a nucleic acid (except AUG, which is ordinarily the only codon for methionine, and TGG, 
which is ordinarily the only codon for tryptophan) can be modified to yield a functionally 
identical molecule. Accordingly, each silent variation of a nucleic acid which encodes a 
30 polypeptide is implicit in each described sequence. 

As to amino acid sequences, one of skill will recognize that individual 
substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein 
sequence which alters, adds or deletes a single amino acid or a small percentage of amino 
acids in the encoded sequence is a "conservatively modified variant" where the alteration 
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I 

results in the substitution of an amino acid with a chemically similar amino acid. 
Conservative substitution tables providing functionally similar amino acids are well known in 
the art. Such conservatively modified variants are in addition to and do not exclude 
polymorphic variants, interspecies homologs, and alleles of the invention. 
5 The following eight groups each contain amino acids that are conservative 

substitutions for one another: 



1) 


Alanine (A), Glycine (G); 


2) 


Aspartic acid (D), Glutamic acid (E); 


3) 


Asparagine (N), Glutamine (Q); 


10 4) 


Arginine (R), Lysine (K); 


5) 


Isoleucine (I), Leucine (L), Methionine (M), Valine (V); 


D 6 > 


Phenylalanine (F), Tyrosine (Y), Tryptophan (W); 


S 7) 


Serine (S), Threonine (T); and 


i 8) 


Cysteine (C), Methionine (M) 



15 {see, e.g.> Creighton, Proteins (1984)). 

The term "heterologous" when used with reference to portions of a nucleic 
acid indicates that the nucleic acid comprises two or more subsequences that are not found in 
the same relationship to each other in nature. For instance, the nucleic acid is typically 
recombinantly produced, having two or more sequences from unrelated genes arranged to 

20 make a new functional nucleic acid, e.g., a promoter from one source and a coding region 
from another source. Similarly, a heterologous protein indicates that the protein comprises 
two or more subsequences that are not found in the same relationship to each other in nature 
(e.g., a fusion protein). 

The phrase "selectively (or specifically) hybridizes to" refers to the binding, 

25 duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under 

stringent hybridization conditions when that sequence is present in a complex mixture (e.g., 
total cellular or library DNA or RNA). 

The phrase "stringent hybridization conditions" refers to conditions under 
which a probe will hybridize to its target subsequence, typically in a complex mixture of 

30 nucleic acid, but to no other sequences. Stringent conditions are sequence-dependent and 
will be different in different circumstances. Longer sequences hybridize specifically at 
higher temperatures. An extensive guide to the hybridization of nucleic acids is found in 
Tijssen, Techniques in Biochemistry and Molecular Biology— Hybridization with Nucleic 
Probes, "Overview of principles of hybridization and the strategy of nucleic acid assays" 
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(1993). Generally, stringent conditions are selected to be about 5-10°C lower than the 
thermal melting point (T m ) for the specific sequence at a defined ionic strength pH. The T m is 
the temperature (under defined ionic strength, pH, and nucleic concentration) at which 50% 
of the probes complementary to the target hybridize to the target sequence at equilibrium (as 
5 the target sequences are present in excess, at T m , 50% of the probes are occupied at 

equilibrium). Stringent conditions will be those in which the salt concentration is less than 
about 1.0 M sodium ion, typically about 0.01 to 1.0 M sodium ion concentration (or other 
salts) at pH 7.0 to 8.3 and the temperature is at least about 30°C for short probes (e.g., 10 to 
50 nucleotides) and at least about 60°C for long probes (e.g., greater than 50 nucleotides). 

10 Stringent conditions may also be achieved with the addition of destabilizing agents such as 
formamide. For selective or specific hybridization, a positive signal is at least two times 
background, optionally 10 times background hybridization. Exemplary stringent 
hybridization conditions can be as following: 50% formamide, 5x SSC, and 1% SDS, 
incubating at 42°C, or, 5x SSC, 1% SDS, incubating at 65°C, with wash in 0.2x SSC, and 

15 0.1%SDSat65°C. 

Nucleic acids that do not hybridize to each other under stringent conditions are 
still substantially identical if the polypeptides which they encode are substantially identical. 
This occurs, for example, when a copy of a nucleic acid is created using the maximum codon 
degeneracy permitted by the genetic code. In such cases, the nucleic acids typically hybridize 

20 under moderately stringent hybridization conditions. Exemplary "moderately stringent 

hybridization conditions" include a hybridization in a buffer of 40% formamide, 1 M NaCl, 
1% SDS at 37°C, and a wash in IX SSC at 45°C. A positive hybridization is at least twice 
background. Those of ordinary skill will readily recognize that alternative hybridization and 
wash conditions can be utilized to provide conditions of similar stringency. 

25 "Antibody" refers to a polypeptide comprising a framework region from an 

immunoglobulin gene or fragments thereof that specifically binds and recognizes an antigen. 
The recognized immunoglobulin genes include the kappa, lambda, alpha, gamma, delta, 
epsilon, and mu constant region genes, as well as the myriad immunoglobulin variable region 
genes. Light chains are classified as either kappa or lambda. Heavy chains are classified as 

30 gamma, mu, alpha, delta, or epsilon, which in turn define the immunoglobulin classes, IgG, 
IgM, IgA, IgD and IgE, respectively. 

An exemplary immunoglobulin (antibody) structural unit comprises a 
tetramer. Each tetramer is composed of two identical pairs of polypeptide chains, each pair 
having one "light" (about 25 kDa) and one "heavy" chain (about 50-70 kDa). The N- 
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terminus of each chain defines a variable region of about 100 to 1 10 or more amino acids 
primarily responsible for antigen recognition. The terms variable light chain (Vl) and 
variable heavy chain (Vh) refer to these light and heavy chains respectively. 

Antibodies exist, e.g., as intact immunoglobulins or as a number of well- 
5 characterized fragments produced by digestion with various peptidases. Thus, for example, 
pepsin digests an antibody below the disulfide linkages in the hinge region to produce F(ab)'2, 
a dimer of Fab which itself is a light chain joined to V h -Ch1 by a disulfide bond. The F(ab)'2 
may be reduced under mild conditions to break the disulfide linkage in the hinge region, 
thereby converting the F(ab)' 2 dimer into an Fab' monomer. The Fab' monomer is 

10 essentially Fab with part of the hinge region (see Fundamental Immunology (Paul ed., 3d ed. 
1993). While various antibody fragments are defined in terms of the digestion of an intact 
antibody, one of skill will appreciate that such fragments may be synthesized de novo either 
chemically or by using recombinant DNA methodology. Thus, the term antibody, as used 
herein, also includes antibody fragments either produced by the modification of whole 

15 antibodies, or those synthesized de novo using recombinant DNA methodologies (e.g., single 
chain Fv) or those identified using phage display libraries (see, e.g., McCafferty et al, Nature 
348:552-554 (1990)). 

For preparation of monoclonal or polyclonal antibodies, any technique known 
in the art can be used (see, e.g., Kohler & Milstein, Nature 256:495-497 (1975); Kozbor et 

20 al, Immunology Today 4: 72 (1983); Cole et ah, pp. 77-96 in Monoclonal Antibodies and 
Cancer Therapy (1985)). Techniques for the production of single chain antibodies (U.S. 
Patent 4,946,778) can be adapted to produce antibodies to polypeptides of this invention. 
Also, transgenic mice, or other organisms such as other mammals, may be used to express 
humanized antibodies. Alternatively, phage display technology can be used to identify 

25 antibodies and heteromeric Fab fragments that specifically bind to selected antigens (see, e.g., 
McCafferty et al, Nature 348:552-554 (1990); Marks et al, Biotechnology 10:779-783 
(1992)). 

The phrase "specifically (or selectively) binds" to an antibody or "specifically 
(or selectively) immunoreactive with/ 7 when referring to a protein or peptide, refers to a 
30 binding reaction that is determinative of the presence of the protein in a heterogeneous 

population of proteins and other biologies. Thus, under designated immunoassay conditions, 
the specified antibodies bind to a particular protein at least two times the background and do 
not substantially bind in a significant amount to other proteins present in the sample. Specific 
binding to an antibody under such conditions may require an antibody that is selected for its 
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specificity for a particular protein. For example, polyclonal antibodies raised to fusion 
proteins can be selected to obtain only those polyclonal antibodies that are specifically 
immunoreactive with fusion protein and not with individual components of the fusion 
proteins. This selection may be achieved by subtracting out antibodies that cross-react with 
5 the individual antigens. A variety of immunoassay formats may be used to select antibodies 
specifically immunoreactive with a particular protein. For example, solid-phase ELISA 
immunoassays are routinely used to select antibodies specifically immunoreactive with a 
protein (see, e.g., Harlow & Lane, Antibodies, A Laboratory Manual (1988), for a description 
of immunoassay formats and conditions that can be used to determine specific 

10 immunoreactivity). Typically a specific or selective reaction will be at least twice 

background signal or noise and more typically more than 10 to 100 times background. 

Polynucleotides may comprise a native sequence (i.e., an endogenous 
sequence that encodes an individual antigen or a portion thereof) or may comprise a variant 
of such a sequence. Polynucleotide variants may contain one or more substitutions, 

15 additions, deletions and/or insertions such that the biological activity of the encoded fusion 
polypeptide is not diminished, relative to a fusion polypeptide comprising native antigens. 
Variants preferably exhibit at least about 70% identity, more preferably at least about 80% 
identity and most preferably at least about 90% identity to a polynucleotide sequence that 
encodes a native polypeptide or a portion thereof. 

20 The terms "identical" or percent "identity," in the context of two or more 

nucleic acids or polypeptide sequences, refer to two or more sequences or subsequences that 
are the same or have a specified percentage of amino acid residues or nucleotides that are the 
same (i.e., 70% identity, optionally 75%, 80%, 85%, 90%, or 95% identity over a specified 
region), when compared and aligned for maximum correspondence over a comparison 

25 window, or designated region as measured using one of the following sequence comparison 
algorithms or by manual alignment and visual inspection. Such sequences are then said to be 
"substantially identical." This definition also refers to the compliment of a test sequence. 
Optionally, the identity exists over a region that is at least about 25 to about 50 amino acids 
or nucleotides in length, or optionally over a region that is 75-100 amino acids or nucleotides 

30 in length. 

For sequence comparison, typically one sequence acts as a reference sequence, 
to which test sequences are compared. When using a sequence comparison algorithm, test 
and reference sequences are entered into a computer, subsequence coordinates are designated, 
if necessary, and sequence algorithm program parameters are designated. Default program 
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parameters can be used, or alternative parameters can be designated. The sequence 
comparison algorithm then calculates the percent sequence identities for the test sequences 
relative to the reference sequence, based on the program parameters. 

A "comparison window", as used herein, includes reference to a segment of 
5 any one of the number of contiguous positions selected from the group consisting of from 25 
to 500, usually about 50 to about 200, more usually about 100 to about 150 in which a 
sequence may be compared to a reference sequence of the same number of contiguous 
positions after the two sequences are optimally aligned. Methods of alignment of sequences 
for comparison are well-known in the art. Optimal alignment of sequences for comparison 

10 can be conducted, e.g., by the local homology algorithm of Smith & Waterman, Adv. Appl 
Math. 2:482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J. Mol 
Biol 48:443 (1970), by the search for similarity method of Pearson & Lipman, Proc. Natl 
Acad. Sci. USA 85:2444 (1988), by computerized implementations of these algorithms (GAP, 
BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics 

15 Computer Group, 575 Science Dr., Madison, WI), or by manual alignment and visual 
inspection {see, e.g., Current Protocols in Molecular Biology (Ausubel et al, eds. 1995 
supplement)). 

One example of a useful algorithm is PILEUP. PILEUP creates a multiple 
sequence alignment from a group of related sequences using progressive, pairwise alignments 

20 to show relationship and percent sequence identity. It also plots a tree or dendogram showing 
the clustering relationships used to create the alignment. PILEUP uses a simplification of the 
progressive alignment method of Feng & Doolittle, J. Mol Evol. 35:351-360 (1987). The 
method used is similar to the method described by Higgins & Sharp, CABIOS 5:151-153 
(1989). The program can align up to 300 sequences, each of a maximum length of 5,000 

25 nucleotides or amino acids. The multiple alignment procedure begins with the pairwise 

alignment of the two most similar sequences, producing a cluster of two aligned sequences. 
This cluster is then aligned to the next most related sequence or cluster of aligned sequences. 
Two clusters of sequences are aligned by a simple extension of the pairwise alignment of two 
individual sequences. The final alignment is achieved by a series of progressive, pairwise 

30 alignments. The program is run by designating specific sequences and their amino acid or 
nucleotide coordinates for regions of sequence comparison and by designating the program 
parameters. Using PILEUP, a reference sequence is compared to other test sequences to 
determine the percent sequence identity relationship using the following parameters: default 
gap weight (3.00), default gap length weight (0.10), and weighted end gaps. PILEUP can be 
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obtained from the GCG sequence analysis software package, e.g., version 7.0 (Devereaux et 
aL, Nuc. Acids Res. 12:387-395 (1984). 

Another example of algorithm that is suitable for determining percent 
sequence identity and sequence similarity are the BLAST and BLAST 2.0 algorithms, which 
5 are described in Altschul et al. y Nuc. Acids Res. 25:3389-3402 (1977) and Altschul et al, J. 
Mol. Biol 215:403-410 (1990), respectively. Software for performing BLAST analyses is 
publicly available through the National Center for Biotechnology Information 
(http://www.ncbi.nlm.nih.gov/). This algorithm involves first identifying high scoring 
sequence pairs (HSPs) by identifying short words of length W in the query sequence, which 

10 either match or satisfy some positive-valued threshold score T when aligned with a word of 
the same length in a database sequence. T is referred to as the neighborhood word score 
threshold (Altschul et al. 9 supra). These initial neighborhood word hits act as seeds for 
initiating searches to find longer HSPs containing them. The word hits are extended in both 
directions along each sequence for as far as the cumulative alignment score can be increased. 

1 5 Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward 
score for a pair of matching residues; always > 0) and N (penalty score for mismatching 
residues; always < 0). For amino acid sequences, a scoring matrix is used to calculate the 
cumulative score. Extension of the word hits in each direction are halted when: the 
cumulative alignment score falls off by the quantity X from its maximum achieved value; the 

20 cumulative score goes to zero or below, due to the accumulation of one or more negative- 
scoring residue alignments; or the end of either sequence is reached. The BLAST algorithm 
parameters W, T, and X determine the sensitivity and speed of the alignment. The BLASTN 
program (for nucleotide sequences) uses as defaults a wordlength (W) of 1 1, an expectation 
(E) or 10, M=5, N— 4 and a comparison of both strands. For amino acid sequences, the 

25 BLASTP program uses as defaults a wordlength of 3, and expectation (E) of 10, and the 

BLOSUM62 scoring matrix (see Henikoff & Henikoff, Proa Natl Acad. Sci. USA 89:10915 
(1989)) alignments (B) of 50, expectation (E) of 10, M=5, N— 4, and a comparison of both 
strands. 

The BLAST algorithm also performs a statistical analysis of the similarity 
30 between two sequences {see, e.g., Karlin & Altschul, Proc. Natl. Acad. Sci. USA 90:5873- 
5787 (1993)), One measure of similarity provided by the BLAST algorithm is the smallest 
sum probability (P(N)), which provides an indication of the probability by which a match 
between two nucleotide or amino acid sequences would occur by chance. For example, a 
nucleic acid is considered similar to a reference sequence if the smallest sum probability in a 
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comparison of the test nucleic acid to the reference nucleic acid is less than about 0.2, more 
preferably less than about 0.01, and most preferably less than about 0.001 . 



III. POLYNUCLEOTIDE COMPOSITIONS 

As used herein, the terms "DNA segment" and "polynucleotide" refer to a 
5 DNA molecule that has been isolated free of total genomic DNA of a particular species. 
Therefore, a DNA segment encoding a polypeptide refers to a DNA segment that contains 
one or more coding sequences yet is substantially isolated away from, or purified free from, 
total genomic DNA of the species from which the DNA segment is obtained. Included within 
the terms "DNA segment" and "polynucleotide" are DNA segments and smaller fragments of 

1 0 such segments, and also recombinant vectors, including, for example, plasmids, cosmids, 
phagemids, phage, viruses, and the like. 

As will be understood by those skilled in the art, the DNA segments of this 
invention can include genomic sequences, extra-genomic and plasmid-encoded sequences 
and smaller engineered gene segments that express, or may be adapted to express, proteins, 

1 5 polypeptides, peptides and the like. Such segments may be naturally isolated, or modified 
synthetically by the hand of man. 

"Isolated," as used herein, means that a polynucleotide is substantially away 
from other coding sequences, and that the DNA segment does not contain large portions of 
unrelated coding DNA, such as large chromosomal fragments or other functional genes or 

20 polypeptide coding regions. Of course, this refers to the DNA segment as originally isolated, 
and does not exclude genes or coding regions later added to the segment by the hand of man. 

As will be recognized by the skilled artisan, polynucleotides may be single- 
stranded (coding or antisense) or double-stranded, and may be DNA (genomic, cDNA or 
synthetic) or RNA molecules. RNA molecules include HnRNA molecules, which contain 

25 introns and correspond to a DNA molecule in a one-to-one manner, and mRNA molecules, 
which do not contain introns. Additional coding or non-coding sequences may, but need not, 
be present within a polynucleotide of the present invention, and a polynucleotide may, but 
need not, be linked to other molecules and/or support materials. 

Polynucleotides may comprise a native sequence (i.e., an endogenous 

30 sequence that encodes a Mycobacterium antigen or a portion thereof) or may comprise a 

variant, or a biological or antigenic functional equivalent of such a sequence. Polynucleotide 
variants may contain one or more substitutions, additions, deletions and/or insertions, as 
further described below, preferably such that the immunogenicity of the encoded polypeptide 
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is not diminished, relative to a native tumor protein. The effect on the immunogenicity of the 
encoded polypeptide may generally be assessed as described herein. The term "variants" also 
encompasses homologous genes of xenogenic origin. 

In additional embodiments, the present invention provides isolated 
polynucleotides and polypeptides comprising various lengths of contiguous stretches of 
sequence identical to or complementary to one or more of the sequences disclosed herein. 
For example, polynucleotides are provided by this invention that comprise at least about 15, 
20, 30, 40, 50, 75, 100, 150, 200, 300, 400, 500 or 1000 or more contiguous nucleotides of 
one or more of the sequences disclosed herein as well as all intermediate lengths there 
between. It will be readily understood that "intermediate lengths", in this context, means any 
length between the quoted values, such as 16, 17, 18, 19, etc.; 21, 22, 23, etc.; 30, 31, 32, etc.; 
50, 51, 52, 53, etc.; 100, 101, 102, 103, etc.; 150, 151, 152, 153, etc.; including all integers 
through 200-500; 500-1,000, and the like. 

The polynucleotides of the present invention, or fragments thereof, regardless 
of the length of the coding sequence itself, may be combined with other DNA sequences, 
such as promoters, polyadenylation signals, additional restriction enzyme sites, multiple 
cloning sites, other coding segments, and the like, such that their overall length may vary 
considerably. It is therefore contemplated that a nucleic acid fragment of almost any length 
may be employed, with the total length preferably being limited by the ease of preparation 
and use in the intended recombinant DNA protocol. For example, illustrative DNA segments 
with total lengths of about 10,000, about 5000, about 3000, about 2,000, about 1,000, about 
500, about 200, about 100, about 50 base pairs in length, and the like, (including all 
intermediate lengths) are contemplated to be useful in many implementations of this 
invention. 

Moreover, it will be appreciated by those of ordinary skill in the art that, as a 
result of the degeneracy of the genetic code, there are many nucleotide sequences that encode 
a polypeptide as described herein. Some of these polynucleotides bear minimal homology to 
the nucleotide sequence of any native gene. Nonetheless, polynucleotides that vary due to 
differences in codon usage are specifically contemplated by the present invention, for 
example polynucleotides that are optimized for human and/or primate codon selection. 
Further, alleles of the genes comprising the polynucleotide sequences provided herein are 
within the scope of the present invention. Alleles are endogenous genes that are altered as a 
result of one or more mutations, such as deletions, additions and/or substitutions of 
nucleotides. The resulting mRNA and protein may, but need not, have an altered structure or 
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function. Alleles may be identified using standard techniques (such as hybridization, 
amplification and/or database sequence comparison). 

IV. POLYNUCLEOTIDE IDENTIFICATION AND CHARACTERIZATION 

Polynucleotides may be identified, prepared and/or manipulated using any of a 
5 variety of well established techniques. For example, a polynucleotide may be identified, as 
described in more detail below, by screening a microarray of cDNAs for tumor-associated 
expression (i.e., expression that is at least two fold greater in a tumor than in normal tissue, as 
determined using a representative assay provided herein). Such screens may be performed, 
for example, using a Synteni microarray (Palo Alto, CA) according to the manufacturer's 

10 instructions (and essentially as described by Schena et al, Proc. Natl Acad. Set USA 

93:10614-10619 (1996) and Heller et al 9 Proc. Natl Acad. Set USA 94:2150-2155 (1997)). 
Alternatively, polynucleotides may be amplified from cDNA prepared from cells expressing 
the proteins described herein, such as M. tuberculosis cells. Such polynucleotides may be 
amplified via polymerase chain reaction (PCR). For this approach, sequence-specific primers 

1 5 may be designed based on the sequences provided herein, and may be purchased or 
synthesized. 

An amplified portion of a polynucleotide of the present invention may be used 
to isolate a full length gene from a suitable library (e.g., a M. tuberculosis cDNA library) 
using well known techniques. Within such techniques, a library (cDNA or genomic) is 

20 screened using one or more polynucleotide probes or primers suitable for amplification. 
Preferably, a library is size-selected to include larger molecules. Random primed libraries 
may also be preferred for identifying 5' and upstream regions of genes. Genomic libraries 
are preferred for obtaining introns and extending 5' sequences. 

For hybridization techniques, a partial sequence may be labeled (e.g., by nick- 

25 translation or end-labeling with 32 P) using well known techniques. A bacterial or 

bacteriophage library is then generally screened by hybridizing filters containing denatured 
bacterial colonies (or lawns containing phage plaques) with the labeled probe (see Sambrook 
et al, Molecular Cloning: A Laboratory Manual (1989)). Hybridizing colonies or plaques 
are selected and expanded, and the DNA is isolated for further analysis. cDNA clones may 

30 be analyzed to determine the amount of additional sequence by, for example, PCR using a 
primer from the partial sequence and a primer from the vector. Restriction maps and partial 
sequences may be generated to identify one or more overlapping clones. The complete 
sequence may then be determined using standard techniques, which may involve generating a 



35 



series of deletion clones. The resulting overlapping sequences can then assembled into a 
single contiguous sequence. A full length cDNA molecule can be generated by ligating 
suitable fragments, using well known techniques. 

Alternatively, there are numerous amplification techniques for obtaining a full 
5 length coding sequence from a partial cDNA sequence. Within such techniques, 

amplification is generally performed via PCR. Any of a variety of commercially available 
kits may be used to perform the amplification step. Primers may be designed using, for 
example, software well known in the art. Primers are preferably 22-30 nucleotides in length, 
have a GC content of at least 50% and anneal to the target sequence at temperatures of about 

10 68°C to 72°C. The amplified region may be sequenced as described above, and overlapping 
sequences assembled into a contiguous sequence. 

One such amplification technique is inverse PCR (see Triglia et al. y NucL 
Acids Res. 16:8186 (1988)), which uses restriction enzymes to generate a fragment in the 
known region of the gene. The fragment is then circularized by intramolecular ligation and 

15 used as a template for PCR with divergent primers derived from the known region. Within 
an alternative approach, sequences adjacent to a partial sequence may be retrieved by 
amplification with a primer to a linker sequence and a primer specific to a known region. 
The amplified sequences are typically subjected to a second round of amplification with the 
same linker primer and a second primer specific to the known region. A variation on this 

20 procedure, which employs two primers that initiate extension in opposite directions from the 
known sequence, is described in WO 96/38591 . Another such technique is known as "rapid 
amplification of cDNA ends" or RACE. This technique involves the use of an internal 
primer and an external primer, which hybridizes to a polyA region or vector sequence, to 
identify sequences that are 5' and 3' of a known sequence. Additional techniques include 

25 capture PCR (Lagerstrom et al , PCR Methods Applic. 1 : 1 1 1 - 1 9 (1 99 1 )) and walking PCR 

(Parker et al.Nucl Acids. Res. 19:3055-60 (1991)). Other methods employing amplification 
may also be employed to obtain a full length cDNA sequence. 

In certain instances, it is possible to obtain a full length cDNA sequence by 
analysis of sequences provided in an expressed sequence tag (EST) database, such as that 

30 available from GenBank. Searches for overlapping ESTs may generally be performed using 
well known programs (e.g., NCBI BLAST searches), and such ESTs may be used to generate 
a contiguous full length sequence. Full length DNA sequences may also be obtained by 
analysis of genomic fragments. 
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V. POLYNUCLEOTIDE EXPRESSION IN HOST CELLS 

In other embodiments of the invention, polynucleotide sequences or fragments 
thereof which encode polypeptides of the invention, or ftision proteins or functional 
equivalents thereof, may be used in recombinant DNA molecules to direct expression of a 
5 polypeptide in appropriate host cells. Due to the inherent degeneracy of the genetic code, 
other DNA sequences that encode substantially the same or a functionally equivalent amino 
acid sequence may be produced and these sequences may be used to clone and express a 
given polypeptide. 

As will be understood by those of skill in the art, it may be advantageous in 

10 some instances to produce polypeptide-encoding nucleotide sequences possessing non- 
naturally occurring codons. For example, codons preferred by a particular prokaryotic or 
eukaryotic host can be selected to increase the rate of protein expression or to produce a 
recombinant RNA transcript having desirable properties, such as a half-life which is longer 
than that of a transcript generated from the naturally occurring sequence. 

15 Moreover, the polynucleotide sequences of the present invention can be 

engineered using methods generally known in the art in order to alter polypeptide encoding 
sequences for a variety of reasons, including but not limited to, alterations which modify the 
cloning, processing, and/or expression of the gene product. For example, DNA shuffling by 
random fragmentation and PCR reassembly of gene fragments and synthetic oligonucleotides 

20 may be used to engineer the nucleotide sequences. In addition, site-directed mutagenesis may 
be used to insert new restriction sites, alter glycosylation patterns, change codon preference, 
produce splice variants, or introduce mutations, and so forth. 

In another embodiment of the invention, natural, modified, or recombinant 
nucleic acid sequences may be ligated to a heterologous sequence to encode a fusion protein, 

25 For example, to screen peptide libraries for inhibitors of polypeptide activity, it may be useful 
to encode a chimeric protein that can be recognized by a commercially available antibody. A 
fusion protein may also be engineered to contain a cleavage site located between the 
polypeptide-encoding sequence and the heterologous protein sequence, so that the 
polypeptide may be cleaved and purified away from the heterologous moiety. 

30 Sequences encoding a desired polypeptide may be synthesized, in whole or in 

part, using chemical methods well known in the art {see Caruthers, M. H. et al., Nucl Acids 
Res. Symp. Ser. pp. 215-223 (1980), Horn et al, Nucl Acids Res. Symp. Ser. pp. 225-232 
(1980)). Alternatively, the protein itself may be produced using chemical methods to 
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synthesize the amino acid sequence of a polypeptide, or a portion thereof. For example, 
peptide synthesis can be performed using various solid-phase techniques (Roberge et al, 
Science 269:202-204 (1995)) and automated synthesis may be achieved, for example, using 
the ABI 431 A Peptide Synthesizer (Perkin Elmer, Palo Alto, CA). 
5 A newly synthesized peptide may be substantially purified by preparative high 

performance liquid chromatography (e.g., Creighton, Proteins, Structures and Molecular 
Principles (1983)) or other comparable techniques available in the art. The composition of 
the synthetic peptides may be confirmed by amino acid analysis or sequencing (e.g., the 
Edman degradation procedure). Additionally, the amino acid sequence of a polypeptide, or 
10 any part thereof, may be altered during direct synthesis and/or combined using chemical 
methods with sequences from other proteins, or any part thereof, to produce a variant 
polypeptide. 

In order to express a desired polypeptide, the nucleotide sequences encoding 
the polypeptide, or functional equivalents, may be inserted into appropriate expression vector, 

1 5 i.e., a vector which contains the necessary elements for the transcription and translation of the 
inserted coding sequence. Methods which are well known to those skilled in the art may be 
used to construct expression vectors containing sequences encoding a polypeptide of interest 
and appropriate transcriptional and translational control elements. These methods include in 
vitro recombinant DNA techniques, synthetic techniques, and in vivo genetic recombination. 

20 Such techniques are described in Sambrook et aL, Molecular Cloning, A Laboratory Manual 
(1989), and Ausubel et al. } Current Protocols in Molecular Biology (1989). 

A variety of expression vector/host systems may be utilized to contain and 
express polynucleotide sequences. These include, but are not limited to, microorganisms 
such as bacteria transformed with recombinant bacteriophage, plasmid, or cosmid DNA 

25 expression vectors; yeast transformed with yeast expression vectors; insect cell systems 

infected with virus expression vectors (e.g., baculo virus); plant cell systems transformed with 
virus expression vectors (e.g. , cauliflower mosaic virus, CaMV; tobacco mosaic virus, TMV) 
or with bacterial expression vectors (e.g., Ti or pBR322 plasmids); or animal cell systems. 

The "control elements" or "regulatory sequences" present in an expression 

30 vector are those non-translated regions of the vector— enhancers, promoters, 5' and 3' 

untranslated regions— which interact with host cellular proteins to carry out transcription and 
translation. Such elements may vary in their strength and specificity. Depending on the 
vector system and host utilized, any number of suitable transcription and translation elements, 
including constitutive and inducible promoters, may be used. For example, when cloning in 
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bacterial systems, inducible promoters such as the hybrid lacZ promoter of the 
PBLUESCRIPT phagemid (Stratagene, La Jolla ? Calif.) or PSPORT1 plasmid (Gibco BRL, 
Gaithersburg, MD) and the like may be used. In mammalian cell systems, promoters from 
mammalian genes or from mammalian viruses are generally preferred. If it is necessary to 
5 generate a cell line that contains multiple copies of the sequence encoding a polypeptide, 
vectors based on SV40 or EBV may be advantageously used with an appropriate selectable 
marker. 

In bacterial systems, a number of expression vectors may be selected 
depending upon the use intended for the expressed polypeptide. For example, when large 

10 quantities are needed, for example for the induction of antibodies, vectors which direct high 
level expression of fusion proteins that are readily purified may be used. Such vectors 
include, but are not limited to, the multifunctional E. coli cloning and expression vectors such 
as BLUESCRIPT (Stratagene), in which the sequence encoding the polypeptide of interest 
may be ligated into the vector in frame with sequences for the amino-terminal Met and the 

15 subsequent 7 residues of p-galactosidase so that a hybrid protein is produced; pIN vectors 
(Van Heeke &Schuster, J. Biol Chem. 264:5503-5509 (1989)); and the like. pGEX Vectors 
(Promega, Madison, Wis.) may also be used to express foreign polypeptides as fusion 
proteins with glutathione S-transferase (GST). In general, such fusion proteins are soluble 
and can easily be purified from lysed cells by adsorption to glutathione-agarose beads 

20 followed by elution in the presence of free glutathione. Proteins made in such systems may 
be designed to include heparin, thrombin, or factor XA protease cleavage sites so that the 
cloned polypeptide of interest can be released from the GST moiety at will. 

In the yeast, Saccharomyces cerevisiae, a number of vectors containing 
constitutive or inducible promoters such as alpha factor, alcohol oxidase, and PGH may be 

25 used. For reviews, see Ausubel et al {supra) and Grant et al, Methods EnzymoL 153:516- 
544 (1987). 

In cases where plant expression vectors are used, the expression of sequences 
encoding polypeptides may be driven by any of a number of promoters. For example, viral 
promoters such as the 35S and 19S promoters of CaMV may be used alone or in combination 
30 with the omega leader sequence from TMV (Takamatsu, EMBO J. 6:307-3 1 1 (1987)). 
Alternatively, plant promoters such as the small subunit of RUBISCO or heat shock 
promoters may be used (Coruzzi et al., EMBO J. 3:1671-1680 (1984); Broglie et al, Science 
224:838-843 (1984); and Winters al, Results Probl Cell Differ. 17:85-105(1991)). These 
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constructs can be introduced into plant cells by direct DNA transformation or pathogen- 
mediated transfection. Such techniques are described in a number of generally available 
reviews {see, e.g., Hobbs in McGraw Hill Yearbook of Science and Technology pp. 191-196 
(1992)). 

5 An insect system may also be used to express a polypeptide of interest. For 

example, in one such system, Autographa californica nuclear polyhedrosis virus (AcNPV) is 
used as a vector to express foreign genes in Spodopterafrugiperda cells or in Trichoplusia 
larvae. The sequences encoding the polypeptide may be cloned into a non-essential region of 
the virus, such as the polyhedrin gene, and placed under control of the polyhedrin promoter, 
10 Successful insertion of the polypeptide-encoding sequence will render the polyhedrin gene 
inactive and produce recombinant virus lacking coat protein. The recombinant viruses may 
then be used to infect, for example, S. frugiperda cells or Trichoplusia larvae in which the 
polypeptide of interest may be expressed (Engelhard et ah, Proc. Natl. Acad. Set U.S.A. 91 
:3224-3227 (1994)). 

15 In mammalian host cells, a number of viral-based expression systems are 

generally available. For example, in cases where an adenovirus is used as an expression 
vector, sequences encoding a polypeptide of interest may be ligated into an adenovirus 
transcription/translation complex consisting of the late promoter and tripartite leader 
sequence. Insertion in a non-essential El or E3 region of the viral genome may be used to 

20 obtain a viable virus which is capable of expressing the polypeptide in infected host cells 
(Logan & Shenk, Proc. Natl Acad. Sci. U.S.A. 81:3655-3659 (1984)). In addition, 
transcription enhancers, such as the Rous sarcoma virus (RSV) enhancer, may be used to 
increase expression in mammalian host cells. 

Specific initiation signals may also be used to achieve more efficient 

25 translation of sequences encoding a polypeptide of interest. Such signals include the ATG 

initiation codon and adjacent sequences. In cases where sequences encoding the polypeptide, 
its initiation codon, and upstream sequences are inserted into the appropriate expression 
vector, no additional transcriptional or translational control signals may be needed. However, 
in cases where only coding sequence, or a portion thereof, is inserted, exogenous translational 

30 control signals including the ATG initiation codon should be provided. Furthermore, the 
initiation codon should be in the correct reading frame to ensure translation of the entire 
insert. Exogenous translational elements and initiation codons may be of various origins, 
both natural and synthetic. The efficiency of expression may be enhanced by the inclusion of 
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enhancers which are appropriate for the particular cell system which is used, such as those 
described in the literature (Scharf et aL, Results Probl Cell Differ. 20:125-162 (1994)). 

In addition, a host cell strain may be chosen for its ability to modulate the 
expression of the inserted sequences or to process the expressed protein in the desired 
5 fashion. Such modifications of the polypeptide include, but are not limited to, acetylation, 
carboxylation. glycosylation, phosphorylation, lipidation, and acylation. Post-translational 
processing which cleaves a "prepro" form of the protein may also be used to facilitate correct 
insertion, folding and/or function. Different host cells such as CHO, HeLa, MDCK, 
HEK293, and WI38, which have specific cellular machinery and characteristic mechanisms 

10 for such post-translational activities, may be chosen to ensure the correct modification and 
processing of the foreign protein. 

For long-term, high-yield production of recombinant proteins, stable 
expression is generally preferred. For example, cell lines which stably express a 
polynucleotide of interest may be transformed using expression vectors which may contain 

15 viral origins of replication and/or endogenous expression elements and a selectable marker 
gene on the same or on a separate vector. Following the introduction of the vector, cells may 
be allowed to grow for 1-2 days in an enriched media before they are switched to selective 
media. The purpose of the selectable marker is to confer resistance to selection, and its 
presence allows growth and recovery of cells which successfully express the introduced 

20 sequences. Resistant clones of stably transformed cells may be proliferated using tissue 
culture techniques appropriate to the cell type. 

Any number of selection systems may be used to recover transformed cell 
lines. These include, but are not limited to, the herpes simplex virus thymidine kinase (Wigler 
et aL, Cell 1 1 ;223-32 (1977)) and adenine phosphoribosyltransferase (Lowy et aL, Cell 

25 22:817-23 (1990)) genes which can be employed in tk.sup.- or aprt.sup.- cells, respectively. 
Also, antimetabolite, antibiotic or herbicide resistance can be used as the basis for selection; 
for example, dhfr which confers resistance to methotrexate (Wigler et aL, Proc. Natl. Acad. 
Set U.S.A. 77:3567-70 (1980)); npt, which confers resistance to the aminoglycosides, 
neomycin and G-418 (Colbere-Garapin et al.^J. MoL Biol. 150:1-14 (1981)); and als or pat, 

30 which confer resistance to chlorsulfixron and phosphinotricin acetyltransferase, respectively 
(Murry, supra). Additional selectable genes have been described, for example, trpB, which 
allows cells to utilize indole in place of tryptophan, or hisD, which allows cells to utilize 
histinol in place of histidine (Hartman & Mulligan, Proc. Natl. Acad. ScL U.S.A. 85:8047-51 
(1988)). Recently, the use of visible markers has gained popularity with such markers as 
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anthocyanins, p-glucuronidase and its substrate GUS, and luciferase and its substrate 
luciferin, being widely used not only to identify transforaiants, but also to quantify the 
amount of transient or stable protein expression attributable to a specific vector system 
(Rhodes etal, Methods MoL Biol 55:121-131 (1995)). 
5 Although the presence/absence of marker gene expression suggests that the 

gene of interest is also present, its presence and expression may need to be confirmed. For 
example, if the sequence encoding a polypeptide is inserted within a marker gene sequence, 
recombinant cells containing sequences can be identified by the absence of marker gene 
function. Alternatively, a marker gene can be placed in tandem with a polypeptide-encoding 

10 sequence under the control of a single promoter. Expression of the marker gene in response 
to induction or selection usually indicates expression of the tandem gene as well. 

Alternatively, host cells which contain and express a desired polynucleotide 
sequence may be identified by a variety of procedures known to those of skill in the art. 
These procedures include, but are not limited to, DNA-DNA or DNA-RNA hybridizations 

1 5 and protein bioassay or immunoassay techniques which include membrane, solution, or chip 
based technologies for the detection and/or quantification of nucleic acid or protein. 

A variety of protocols for detecting and measuring the expression of 
polynucleotide-encoded products, using either polyclonal or monoclonal antibodies specific 
for the product are known in the art. Examples include enzyme-linked immunosorbent assay 

20 (ELISA), radioimmunoassay (RIA), and fluorescence activated cell sorting (FACS). A two- 
site, monoclonal-based immunoassay utilizing monoclonal antibodies reactive to two non- 
interfering epitopes on a given polypeptide may be preferred for some applications, but a 
competitive binding assay may also be employed. These and other assays are described, 
among other places, in Hampton et al, Serological Methods, a Laboratory Manual (1990) 

25 andMaddox et al, J. Exp. Med. 158:1211-1216 (1983). 

A wide variety of labels and conjugation techniques are known by those 
skilled in the art and may be used in various nucleic acid and amino acid assays. Means for 
producing labeled hybridization or PCR probes for detecting sequences related to 
polynucleotides include oligolabeling, nick translation, end-labeling or PCR amplification 

30 using a labeled nucleotide. Alternatively, the sequences, or any portions thereof may be 
cloned into a vector for the production of an mRNA probe. Such vectors are known in the 
art, are commercially available, and may be used to synthesize RNA probes in vitro by 
addition of an appropriate RNA polymerase such as T7, T3, or SP6 and labeled nucleotides. 
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These procedures may be conducted using a variety of commercially available kits. Suitable 
reporter molecules or labels, which may be used include radionuclides, enzymes, fluorescent, 
chemiluminescent, or chromogenic agents as well as substrates, cofactors, inhibitors, 
magnetic particles, and the like. 
5 Host cells transformed with a polynucleotide sequence of interest may be 

cultured under conditions suitable for the expression and recovery of the protein from cell 
culture. The protein produced by a recombinant cell may be secreted or contained 
intracellularly depending on the sequence and/or the vector used. As will be understood by 
those of skill in the art, expression vectors containing polynucleotides of the invention may 

10 be designed to contain signal sequences which direct secretion of the encoded polypeptide 

through a prokaryotic or eukaryotic cell membrane. Other recombinant constructions may be 
used to join sequences encoding a polypeptide of interest to nucleotide sequence encoding a 
polypeptide domain which will facilitate purification of soluble proteins. Such purification 
facilitating domains include, but are not limited to, metal chelating peptides such as histidine- 

15 tryptophan modules that allow purification on immobilized metals, protein A domains that 
allow purification on immobilized immunoglobulin, and the domain utilized in the FLAGS 
extension/affinity purification system (Immunex Corp., Seattle, Washington). The inclusion 
of cleavable linker sequences such as those specific for Factor XA or enterokinase 
(Invitrogen. San Diego, Calif.) between the purification domain and the encoded polypeptide 

20 may be used to facilitate purification. One such expression vector provides for expression of 
a fusion protein containing a polypeptide of interest and a nucleic acid encoding 6 histidine 
residues preceding a thioredoxin or an enterokinase cleavage site. The histidine residues 
facilitate purification on IMIAC (immobilized metal ion affinity chromatography) as 
described in Porath et al. } Prot Exp, Purif. 3:263-281 (1992) while the enterokinase cleavage 

25 site provides a means for purifying the desired polypeptide from the fusion protein. A 

discussion of vectors which contain fusion proteins is provided in Kroll et al, DNA Cell Biol. 
12:441-453 (1993)). 

In addition to recombinant production methods, polypeptides of the invention, 
and fragments thereof, may be produced by direct peptide synthesis using solid-phase 

30 techniques (Merrifield, J. Am. Chem. Soa 85:2149-2154 (1963)). Protein synthesis may be 
performed using manual techniques or by automation. Automated synthesis may be 
achieved, for example, using Applied Biosystems 431 A Peptide Synthesizer (Perkin Elmer). 
Alternatively, various fragments may be chemically synthesized separately and combined 
using chemical methods to produce the full length molecule. 
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VI. IN VIVO POLYNUCLEOTIDE DELIVERY TECHNIQUES 

In additional embodiments, genetic constructs comprising one or more of the 
polynucleotides of the invention are introduced into cells in vivo. This may be achieved 
using any of a variety or well known approaches, several of which are outlined below for the 
5 purpose of illustration. 

1. Adenovirus 

One of the preferred methods for in vivo delivery of one or more nucleic acid 
sequences involves the use of an adenovirus expression vector, "Adenovirus expression 
vector" is meant to include those constructs containing adenovirus sequences sufficient to (a) 

10 support packaging of the construct and (b) to express a polynucleotide that has been cloned 
therein in a sense or antisense orientation. Of course, in the context of an antisense construct, 
expression does not require that the gene product be synthesized. 

The expression vector comprises a genetically engineered form of an 
adenovirus. Knowledge of the genetic organization of adenovirus, a 36 kb, linear, double- 

15 stranded DNA virus, allows substitution of large pieces of adenoviral DNA with foreign 
sequences up to 7 kb (Grunhaus & Horwitz, 1992). In contrast to retrovirus, the adenoviral 
infection of host cells does not result in chromosomal integration because adenoviral DNA 
can replicate in an episomal manner without potential genotoxicity. Also, adenoviruses are 
structurally stable, and no genome rearrangement has been detected after extensive 

20 amplification. Adenovirus can infect virtually all epithelial cells regardless of their cell cycle 
stage. So far, adenoviral infection appears to be linked only to mild disease such as acute 
respiratory disease in humans. 

Adenovirus is particularly suitable for use as a gene transfer vector because of 
its mid-sized genome, ease of manipulation, high titer, wide target-cell range and high 

25 infectivity. Both ends of the viral genome contain 100-200 base pair inverted repeats (ITRs), 
which are cis elements necessary for viral DNA replication and packaging. The early (E) and 
late (L) regions of the genome contain different transcription units that are divided by the 
onset of viral DNA replication. The El region (El A and E1B) encodes proteins responsible 
for the regulation of transcription of the viral genome and a few cellular genes. The 

30 expression of the E2 region (E2A and E2B) results in the synthesis of the proteins for viral 
DNA replication. These proteins are involved in DNA replication, late gene expression and 
host cell shut-off (Renan, 1990). The products of the late genes, including the majority of the 
viral capsid proteins, are expressed only after significant processing of a single primary 
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transcript issued by the major late promoter (MLP). The MLP, (located at 16.8 m.u.) is 
particularly efficient during the late phase of infection, and all the mRNA's issued from this 
promoter possess a 5 '-tripartite leader (TPL) sequence which makes them preferred mRNA's 
for translation. 

5 In a current system, recombinant adenovirus is generated from homologous 

recombination between shuttle vector and provirus vector. Due to the possible recombination 
between two pro viral vectors, wild-type adenovirus may be generated from this process. 
Therefore, it is critical to isolate a single clone of virus from an individual plaque and 
examine its genomic structure. 

10 Generation and propagation of the current adenovirus vectors, which are 

replication deficient, depend on a unique helper cell line, designated 293, which was 
transformed from human embryonic kidney cells by Ad5 DNA fragments and constitutively 
expresses El proteins (Graham et aL, 1977). Since the E3 region is dispensable from the 
adenovirus genome (Jones & Shenk, 1978), the current adenovirus vectors, with the help of 

15 293 cells, carry foreign DNA in either the El , the D3 or both regions (Graham & Prevec, 
1991). In nature, adenovirus can package approximately 105% of the wild-type genome 
(Ghosh-Choudhury et aL, 1987), providing capacity for about 2 extra kB of DNA. Combined 
with the approximately 5.5 kB of DNA that is replaceable in the El and E3 regions, the 
maximum capacity of the current adenovirus vector is under 7.5 kB, or about 15% of the total 

20 length of the vector. More than 80% of the adenovirus viral genome remains in the vector 
backbone and is the source of vector-borne cytotoxicity. Also, the replication deficiency of 
the El -deleted virus is incomplete. For example, leakage of viral gene expression has been 
observed with the currently available vectors at high multiplicities of infection (MOI) 
(Mulligan, 1993). 

25 Helper cell lines may be derived from human cells such as human embryonic 

kidney cells, muscle cells, hematopoietic cells or other human embryonic mesenchymal or 
epithelial cells. Alternatively, the helper cells may be derived from the cells of other 
mammalian species that are permissive for human adenovirus. Such cells include, e.g., Vero 
cells or other monkey embryonic mesenchymal or epithelial cells. As stated above, the 

30 currently preferred helper cell line is 293 . 

Recently, Racher et aL (1995) disclosed improved methods for culturing 293 
cells and propagating adenovirus. In one format, natural cell aggregates are grown by 
inoculating individual cells into 1 liter siliconized spinner flasks (Techne, Cambridge, UK) 
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containing 100-200 ml of medium. Following stirring at 40 rpm, the cell viability is 
estimated with trypan blue. In another format, Fibra-Cel microcarriers (Bibby Sterlin, Stone, 
UK) (5 g/1) is employed as follows. A cell inoculum, resuspended in 5 ml of medium, is 
added to the carrier (50 ml) in a 250 ml Erlenmeyer flask and left stationary, with occasional 
5 agitation, for 1 to 4 h. The medium is then replaced with 50 ml of fresh medium and shaking 
initiated. For virus production, cells are allowed to grow to about 80% confluence, after 
which time the medium is replaced (to 25% of the final volume) and adenovirus added at an 
MOI of 0.05. Cultures are left stationary overnight, following which the volume is increased 
to 100% and shaking commenced for another 72 h. 

10 Other than the requirement that the adenovirus vector be replication defective, 

or at least conditionally defective, the nature of the adenovirus vector is not believed to be 
crucial to the successful practice of the invention. The adenovirus may be of any of the 42 
different known serotypes or subgroups A-F. Adenovirus type 5 of subgroup C is the 
preferred starting material in order to obtain a conditional replication-defective adenovirus 

15 vector for use in the present invention, since Adenovirus type 5 is a human adenovirus about 
which a great deal of biochemical and genetic information is known, and it has historically 
been used for most constructions employing adenovirus as a vector. 

As stated above, the typical vector according to the present invention is 
replication defective and will not have an adenovirus El region. Thus, it will be most 

20 convenient to introduce the polynucleotide encoding the gene of interest at the position from 
which the El -coding sequences have been removed. However, the position of insertion of 
the construct within the adenovirus sequences is not critical to the invention. The 
polynucleotide encoding the gene of interest may also be inserted in lieu of the deleted E3 
region in E3 replacement vectors as described by Karlsson et al (1986) or in the E4 region 

25 where a helper cell line or helper virus complements the E4 defect. 

Adenovirus is easy to grow and manipulate and exhibits broad host range in 
vitro and in vivo. This group of viruses can be obtained in high titers, e.g., 10 9 -10 11 plaque- 
forming units per ml, and they are highly infective. The life cycle of adenovirus does not 
require integration into the host cell genome. The foreign genes delivered by adenovirus 

30 vectors are episomal and, therefore, have low genotoxicity to host cells. No side effects have 
been reported in studies of vaccination with wild- type adenovirus (Couch et al, 1963; Top et 
al, 1971), demonstrating their safety and therapeutic potential as in vivo gene transfer 
vectors. 
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Adenovirus vectors have been used in eukaryotic gene expression (Levrero et 
al , 1991; Gomez-Foix et ai, 1992) and vaccine development (Grunhaus & Horwitz, 1992; 
Graham & Prevec, 1992). Recently , animal studies suggested that recombinant adenovirus 
could be used for gene therapy (Stratford-Perricaudet & Perricaudet, 1991; Stratford- 
5 Perricaudet et al, 1990; Rich et al, 1993). Studies in administering recombinant adenovirus 
to different tissues include trachea instillation (Rosenfeld et al, 1991; Rosenfeld et al, 1992), 
muscle injection (Ragot et al, 1993), peripheral intravenous injections (Herz & Gerard, 
1993) and stereotactic inoculation into the brain (Le Gal La Salle et al, 1993). 

B. Retroviruses 

10 The retroviruses are a group of single-stranded RNA viruses characterized by 

an ability to convert their RNA to double-stranded DNA in infected cells by a process of 
reverse-transcription (Coffin, 1990). The resulting DNA then stably integrates into cellular 
chromosomes as a provirus and directs synthesis of viral proteins. The integration results in 
the retention of the viral gene sequences in the recipient cell and its descendants. The 

15 retroviral genome contains three genes, gag, pol, and env that code for capsid proteins, 

polymerase enzyme, and envelope components, respectively. A sequence found upstream 
from the gag gene contains a signal for packaging of the genome into virions. Two long 
terminal repeat (LTR) sequences are present at the 5' and 3' ends of the viral genome. These 
contain strong promoter and enhancer sequences and are also required for integration in the 

20 host cell genome (Coffin, 1990). 

In order to construct a retroviral vector, a nucleic acid encoding one or more 
oligonucleotide or polynucleotide sequences of interest is inserted into the viral genome in 
the place of certain viral sequences to produce a virus that is replication-defective. In order 
to produce virions, a packaging cell line containing the gag, pol, and env genes but without 

25 the LTR and packaging components is constructed (Mann et al , 1983). When a recombinant 
plasmid containing a cDNA, together with the retroviral LTR and packaging sequences is 
introduced into this cell line (by calcium phosphate precipitation for example), the packaging 
sequence allows the RNA transcript of the recombinant plasmid to be packaged into viral 
particles, which are then secreted into the culture media (Nicolas & Rubenstein, 1988; Temin, 

30 1986; Mann et al, 1983). The media containing the recombinant retroviruses is then 

collected, optionally concentrated, and used for gene transfer. Retroviral vectors are able to 
infect a broad variety of cell types. However, integration and stable expression require the 
division of host cells (Paskind et al, 1975). 
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A novel approach designed to allow specific targeting of retrovirus vectors 
was recently developed based on the chemical modification of a retrovirus by the chemical 
addition of lactose residues to the viral envelope. This modification could permit the specific 
infection of hepatocytes via sialoglycoprotein receptors. 
5 A different approach to targeting of recombinant retroviruses was designed in 

which biotinylated antibodies against a retroviral envelope protein and against a specific cell 
receptor were used. The antibodies were coupled via the biotin components by using 
streptavidin (Roux et aL, 1989). Using antibodies against major histocompatibility complex 
class I and class II antigens, they demonstrated the infection of a variety of human cells that 
10 bore those surface antigens with an ecotropic virus in vitro (Roux et al, 1989). 

C. Adeno-Associated Viruses 

AAV (Ridgeway, 1988; Hermonat & Muzycska, 1984) is a paro virus, 
discovered as a contamination of adenoviral stocks. It is a ubiquitous virus (antibodies are 
present in 85% of the US human population) that has not been linked to any disease. It is 

1 5 also classified as a dependovirus, because its replications is dependent on the presence of a 
helper virus, such as adenovirus. Five serotypes have been isolated, of which AAV-2 is the 
best characterized. AAV has a single-stranded linear DNA that is encapsidated into capsid 
proteins VP1, VP2 and VP3 to form an icosahedral virion of 20 to 24 nm in diameter 
(Muzyczka & McLaughlin, 1988). 

20 The AAV DNA is approximately 4.7 kilobases long. It contains two open 

reading frames and is flanked by two ITRs. There are two major genes in the AAV genome: 
rep and cap. The rep gene codes for proteins responsible for viral replications, whereas cap 
codes for capsid protein VP 1-3. Each ITR forms a T-shaped hairpin structure. These 
terminal repeats are the only essential cis components of the AAV for chromosomal 

25 integration. Therefore, the AAV can be used as a vector with all viral coding sequences 

removed and replaced by the cassette of genes for delivery. Three viral promoters have been 
identified and named p5, pi 9, and p40, according to their map position. Transcription from 
p5 and pl9 results in production of rep proteins, and transcription from p40 produces the 
capsid proteins (Hermonat & Muzyczka, 1984). 

30 There are several factors that prompted researchers to study the possibility of 

using rAAV as an expression vector One is that the requirements for delivering a gene to 
integrate into the host chromosome are surprisingly few. It is necessary to have the 145-bp 
ITRs, which are only 6% of the AAV genome. This leaves room in the vector to assemble a 
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4.5-kb DNA insertion. While this carrying capacity may prevent the AAV from delivering 
large genes, it is amply suited for delivering the antisense constructs of the present invention. 

AAV is also a good choice of delivery vehicles due to its safety. There is a 
relatively complicated rescue mechanism: not only wild type adenovirus but also AAV genes 
5 are required to mobilize rAAV. Likewise, AAV is not pathogenic and not associated with 
any disease. The removal of viral coding sequences minimizes immune reactions to viral 
gene expression, and therefore, rAAV does not evoke an inflammatory response. 

D. Other Viral Vectors as Expression Constructs 

Other viral vectors may be employed as expression constructs in the present 

10 invention for the delivery of oligonucleotide or polynucleotide sequences to a host cell. 

Vectors derived from viruses such as vaccinia virus (Ridgeway, 1988; Coupar et al, 1988), 
lentiviruses, polio viruses and herpes viruses may be employed. They offer several attractive 
features for various mammalian cells (Friedmann, 1989; Ridgeway, 1988; Coupar et al, 
1988; Horwich et al, 1990). 

1 5 With the recent recognition of defective hepatitis B viruses, new insight was 

gained into the structure-function relationship of different viral sequences. In vitro studies 
showed that the virus could retain the ability for helper-dependent packaging and reverse 
transcription despite the deletion of up to 80% of its genome (Horwich et al, 1990). This 
suggested that large portions of the genome could be replaced with foreign genetic material. 

20 The hepatotropism and persistence (integration) were particularly attractive properties for 
liver-directed gene transfer. Chang et al (1991) introduced the chloramphenicol 
acetyltransferase (CAT) gene into duck hepatitis B virus genome in the place of the 
polymerase, surface, and pre-surface coding sequences. It was cotransfected with wild-type 
virus into an avian hepatoma cell line. Culture media containing high titers of the 

25 recombinant virus were used to infect primary duckling hepatocytes. Stable CAT gene 
expression was detected for at least 24 days after transfection (Chang et al, 1991). 

E. Non-viral vectors 

In order to effect expression of the oligonucleotide or polynucleotide 
sequences of the present invention, the expression construct must be delivered into a cell. 
30 This delivery may be accomplished in vitro, as in laboratory procedures for transforming 
cells lines, or in vivo or ex vivo, as in the treatment of certain disease states. As described 
above, one preferred mechanism for delivery is via viral infection where the expression 
construct is encapsulated in an infectious viral particle. 
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Once the expression construct has been delivered into the cell the nucleic acid 
encoding the desired oligonucleotide or polynucleotide sequences may be positioned and 
expressed at different sites. In certain embodiments, the nucleic acid encoding the construct 
may be stably integrated into the genome of the cell. This integration may be in the specific 
5 location and orientation via homologous recombination (gene replacement) or it may be 
integrated in a random, non-specific location (gene augmentation). In yet further 
embodiments, the nucleic acid may be stably maintained in the cell as a separate, episomal 
segment of DNA. Such nucleic acid segments or "episomes" encode sequences sufficient to 
permit maintenance and replication independent of or in synchronization with the host cell 

10 cycle. How the expression construct is delivered to a cell and where in the cell the nucleic 
acid remains is dependent on the type of expression construct employed. 

In certain embodiments of the invention, the expression construct comprising 
one or more oligonucleotide or polynucleotide sequences may simply consist of naked 
recombinant DNA or plasmids. Transfer of the construct may be performed by any of the 

1 5 methods mentioned above which physically or chemically permeabilize the cell membrane. 
This is particularly applicable for transfer in vitro but it may be applied to in vivo use as well. 
Dubensky et al (1984) successfully injected polyomavirus DNA in the form of calcium 
phosphate precipitates into liver and spleen of adult and newborn mice demonstrating active 
viral replication and acute infection. Benvenisty & Reshef (1986) also demonstrated that 

20 direct intraperitoneal injection of calcium phosphate-precipitated plasmids results in 

expression of the transfected genes. It is envisioned that DNA encoding a gene of interest 
may also be transferred in a similar manner in vivo and express the gene product. 

Another embodiment of the invention for transferring a naked DNA 
expression construct into cells may involve particle bombardment. This method depends on 

25 the ability to accelerate DNA-coated microprojectiles to a high velocity allowing them to 
pierce cell membranes and enter cells without killing them (Klein et al, 1987). Several 
devices for accelerating small particles have been developed. One such device relies on a 
high voltage discharge to generate an electrical current, which in turn provides the motive 
force (Yang et al 9 1990). The microprojectiles used have consisted of biologically inert 

30 substances such as tungsten or gold beads. 

Selected organs including the liver, skin, and muscle tissue of rats and mice 
have been bombarded in vivo (Yang et al, 1990; Zelenin et al. 9 1991). This may require 
surgical exposure of the tissue or cells, to eliminate any intervening tissue between the gun 
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and the target organ, z.e., ex vivo treatment. Again, DNA encoding a particular gene may be 
delivered via this method and still be incorporated by the present invention. 



VII. POLYPEPTIDE COMPOSITIONS 

The present invention, in other aspects, provides polypeptide compositions. 
5 Generally, a polypeptide of the invention will be an isolated polypeptide (or an epitope, 
variant, or active fragment thereof) derived from a mammalian species. Preferably, the 
polypeptide is encoded by a polynucleotide sequence disclosed herein or a sequence which 
hybridizes under moderately stringent conditions to a polynucleotide sequence disclosed 
herein. Alternatively, the polypeptide may be defined as a polypeptide which comprises a 

10 contiguous amino acid sequence from an amino acid sequence disclosed herein, or which 
polypeptide comprises an entire amino acid sequence disclosed herein. 

Immunogenic portions may generally be identified using well known 
techniques, such as those summarized in Paul, Fundamental Immunology, 3rd ed., 243-247 
(1993) and references cited therein. Such techniques include screening polypeptides for the 

15 ability to react with antigen-specific antibodies, antisera and/or T-cell lines or clones. As 
used herein, antisera and antibodies are "antigen-specific" if they specifically bind to an 
antigen (i.e., they react with the protein in an ELISA or other immunoassay, and do not react 
detectably with unrelated proteins). Such antisera and antibodies may be prepared as 
described herein, and using well known techniques. An immunogenic portion of a 

20 Mycobacterium sp. protein is a portion that reacts with such antisera and/or T-cells at a level 
that is not substantially less than the reactivity of the full length polypeptide (e.g., in an 
ELISA and/or T-cell reactivity assay). Such immunogenic portions may react within such 
assays at a level that is similar to or greater than the reactivity of the full length polypeptide. 
Such screens may generally be performed using methods well known to those of ordinary 

25 skill in the art, such as those described in Harlow & Lane, Antibodies: A Laboratory Manual 
(1988). For example, a polypeptide may be immobilized on a solid support and contacted 
with patient sera to allow binding of antibodies within the sera to the immobilized 
polypeptide. Unbound sera may then be removed and bound antibodies detected using, for 
example, 125 I-labeled Protein A. 

30 Polypeptides may be prepared using any of a variety of well known 

techniques. Recombinant polypeptides encoded by DNA sequences as described above may 
be readily prepared from the DNA sequences using any of a variety of expression vectors 
known to those of ordinary skill in the art. Expression may be achieved in any appropriate 
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host cell that has been transformed or transfected with an expression vector containing a 
DNA molecule that encodes a recombinant polypeptide. Suitable host cells include 
prokaryotes, yeast, and higher eukaryotic cells, such as mammalian cells and plant cells. 
Preferably, the host cells employed are E. coli, yeast or a mammalian cell line such as COS or 
5 CHO. Supernatants from suitable host/vector systems which secrete recombinant protein or 
polypeptide into culture media may be first concentrated using a commercially available 
filter. Following concentration, the concentrate may be applied to a suitable purification 
matrix such as an affinity matrix or an ion exchange resin. Finally, one or more reverse 
phase HPLC steps can be employed to further purify a recombinant polypeptide. 

10 Polypeptides of the invention, immunogenic fragments thereof, and other 

variants having less than about 100 amino acids, and generally less than about 50 amino 
acids, may also be generated by synthetic means, using techniques well known to those of 
ordinary skill in the art. For example, such polypeptides may be synthesized using any of the 
commercially available solid-phase techniques, such as the Merrifield solid-phase synthesis 

15 method, where amino acids are sequentially added to a growing amino acid chain. See 

Merrifield, J. Am. Chem. Soc. 85:2149-2146 (1963). Equipment for automated synthesis of 
polypeptides is commercially available from suppliers such as Perkin Elmer/ Applied 
BioSy stems Division (Foster City, CA), and may be operated according to the manufacturer's 
instructions. 

20 Within certain specific embodiments, a polypeptide may be a fusion protein 

that comprises multiple polypeptides as described herein, or that comprises at least one 
polypeptide as described herein and an unrelated sequence, such as a known tumor protein. 
A fusion partner may, for example, assist in providing T helper epitopes (an immunological 
fusion partner), preferably T helper epitopes recognized by humans, or may assist in 

25 expressing the protein (an expression enhancer) at higher yields than the native recombinant 
protein. Certain preferred fusion partners are both immunological and expression enhancing 
fusion partners. Other fusion partners may be selected so as to increase the solubility of the 
protein or to enable the protein to be targeted to desired intracellular compartments. Still 
further fusion partners include affinity tags, which facilitate purification of the protein. 

30 Fusion proteins may generally be prepared using standard techniques, 

including chemical conjugation. Preferably, a fusion protein is expressed as a recombinant 
protein, allowing the production of increased levels, relative to a non- fused protein, in an 
expression system. Briefly, DNA sequences encoding the polypeptide components may be 
assembled separately, and ligated into an appropriate expression vector. The 3' end of the 
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DNA sequence encoding one polypeptide component is ligated, with or without a peptide 
linker, to the 5' end of a DNA sequence encoding the second polypeptide component so that 
the reading frames of the sequences are in phase. This permits translation into a single fusion 
protein that retains the biological activity of both component polypeptides. 
5 A peptide linker sequence may be employed to separate the first and second 

polypeptide components by a distance sufficient to ensure that each polypeptide folds into its 
secondary and tertiary structures. Such a peptide linker sequence is incorporated into the 
fusion protein using standard techniques well known in the art. Suitable peptide linker 
sequences may be chosen based on the following factors: (1) their ability to adopt a flexible 

10 extended conformation; (2) their inability to adopt a secondary structure that could interact 
with functional epitopes on the first and second polypeptides; and (3) the lack of hydrophobic 
or charged residues that might react with the polypeptide functional epitopes. Preferred 
peptide linker sequences contain Gly, Asn and Ser residues. Other near neutral amino acids, 
such as Thr and Ala may also be used in the linker sequence. Amino acid sequences which 

1 5 may be usefully employed as linkers include those disclosed in Maratea et al. , Gene 40:39-46 
(1985); Murphy et al, Proc. Natl. Acad. Set USA 83:8258-8262 (1986); U.S. Patent No. 
4,935,233 and U.S. Patent No. 4,751,180, The linker sequence may generally be from 1 to 
about 50 amino acids in length. Linker sequences are not required when the first and second 
polypeptides have non-essential N-terminal amino acid regions that can be used to separate 

20 the functional domains and prevent steric interference. 

The ligated DNA sequences are operably linked to suitable transcriptional or 
translational regulatory elements. The regulatory elements responsible for expression of 
DNA are located only 5' to the DNA sequence encoding the first polypeptides. Similarly, 
stop codons required to end translation and transcription termination signals are only present 

25 V to the DNA sequence encoding the second polypeptide. 

Fusion proteins are also provided. Such proteins comprise a polypeptide as 
described herein together with an unrelated immunogenic protein. Preferably the 
immunogenic protein is capable of eliciting a recall response. Examples of such proteins 
include tetanus, tuberculosis and hepatitis proteins {see, e.g., Stoute et al, New Engl J. Med. 

30 336:86-91 (1997)). 

Within preferred embodiments, an immunological fusion partner is derived 
from protein D, a surface protein of the gram-negative bacterium Haemophilus influenza B 
(WO 91/18926). Preferably, a protein D derivative comprises approximately the first third of 
the protein {e.g., the first N-terminal 100-1 10 amino acids), and a protein D derivative may 
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be lipidated. Within certain preferred embodiments, the first 109 residues of a lipoprotein D 
fusion partner is included on the N-terminus to provide the polypeptide with additional 
exogenous T-cell epitopes and to increase the expression level in E. coli (thus functioning as 
an expression enhancer). The lipid tail ensures optimal presentation of the antigen to antigen 

5 presenting cells. Other fusion partners include the non-structural protein from influenzae 
virus, NS1 (hemaglutinin). Typically, the N-terminal 81 amino acids are used, although 
different fragments that include T-helper epitopes may be used. 

In another embodiment, the immunological fusion partner is the protein 
known as LYTA, or a portion thereof (preferably a C-terminal portion), LYTA is derived 

10 from Streptococcus pneumoniae, which synthesizes an N-acetyl-L-alanine amidase known as 
amidase LYTA (encoded by the LytA gene; Gene 43:265-292 (1986)). LYTA is an autolysin 
that specifically degrades certain bonds in the peptidoglycan backbone. The C-terminal 
domain of the LYTA protein is responsible for the affinity to the choline or to some choline 
analogues such as DEAE. This property has been exploited for the development of E. coli C- 

15 LYTA expressing plasmids useful for expression of fusion proteins. Purification of hybrid 
proteins containing the C-LYTA fragment at the amino terminus has been described {see 
Biotechnology 10:795-798 (1992)). Within a preferred embodiment, a repeat portion of 
LYTA may be incorporated into a fusion protein. A repeat portion is found in the C-terminal 
region starting at residue 178. A particularly preferred repeat portion incorporates residues 

20 188-305. 

In general, polypeptides (including fusion proteins) and polynucleotides as 
described herein are isolated. An "isolated" polypeptide or polynucleotide is one that is 
removed from its original environment. For example, a naturally-occurring protein is isolated 
if it is separated from some or all of the coexisting materials in the natural system. 
25 Preferably, such polypeptides are at least about 90% pure, more preferably at least about 95% 
pure and most preferably at least about 99% pure. A polynucleotide is considered to be 
isolated if, for example, it is cloned into a vector that is not a part of the natural environment. 

VIIL T CELLS 

Immunotherapeutic compositions may also, or alternatively, comprise T cells 
30 specific for a Mycobacterium antigen. Such cells may generally be prepared in vitro or ex 
vivo, using standard procedures. For example, T cells may be isolated from bone marrow, 
peripheral blood, or a fraction of bone marrow or peripheral blood of a patient, using a 
commercially available cell separation system, such as the Isolex™ System, available from 
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Nexell Therapeutics, Inc. (Irvine, CA; see also U.S. Patent No. 5,240,856; U.S. Patent No. 
5,215,926; WO 89/06280; WO 91/16116 and WO 92/07243). Alternatively, T cells may be 
derived from related or unrelated humans, non-human mammals, cell lines or cultures. 

T cells may be stimulated with a polypeptide of the invention, polynucleotide 
5 encoding such a polypeptide, and/or an antigen presenting cell (APC) that expresses such a 
polypeptide. Such stimulation is performed under conditions and for a time sufficient to 
permit the generation of T cells that are specific for the polypeptide. Preferably, the 
polypeptide or polynucleotide is present within a delivery vehicle, such as a microsphere, to 
facilitate the generation of specific T cells. 

10 T cells are considered to be specific for a polypeptide of the invention if the T 

cells specifically proliferate, secrete cytokines or kill target cells coated with the polypeptide 
or expressing a gene encoding the polypeptide. T cell specificity may be evaluated using any 
of a variety of standard techniques. For example, within a chromium release assay or 
proliferation assay, a stimulation index of more than two fold increase in lysis and/or 

1 5 proliferation, compared to negative controls, indicates T cell specificity. Such assays may be 
performed, for example, as described in Chen et aL, Cancer Res. 54:1065-1070 (1994)). 
Alternatively, detection of the proliferation of T cells may be accomplished by a variety of 
known techniques. For example, T cell proliferation can be detected by measuring an 
increased rate of DNA synthesis (e.g., by pulse-labeling cultures of T cells with tritiated 

20 thymidine and measuring the amount of tritiated thymidine incorporated into DNA). Contact 
with a polypeptide of the invention (100 ng/ml - 100 |ug/ml, preferably 200 ng/ml-25 jag/ml) 
for 3 - 7 days should result in at least a two fold increase in proliferation of the T cells. 
Contact as described above for 2-3 hours should result in activation of the T cells, as 
measured using standard cytokine assays in which a two fold increase in the level of cytokine 

25 release (e.g., TNF or IFN-y) is indicative of T cell activation (see Coligan et aL, Current 
Protocols in Immunology, vol. 1 (1998)). T cells that have been activated in response to a 
polypeptide, polynucleotide or polypeptide-expressing APC may be CD4 + and/or CD8 + . 
Protein-specific T cells may be expanded using standard techniques. Within preferred 
embodiments, the T cells are derived from a patient, a related donor or an unrelated donor, 

30 and are administered to the patient following stimulation and expansion. 

For therapeutic purposes, CD4 + or CD8 + T cells that proliferate in response to 
a polypeptide, polynucleotide or APC can be expanded in number either in vitro or in vivo. 
Proliferation of such T cells in vitro may be accomplished in a variety of ways. For example, 
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the T cells can be re-exposed to a polypeptide, or a short peptide corresponding to an 
immunogenic portion of such a polypeptide, with or without the addition of T cell growth 
factors, such as interleukin-2, and/or stimulator cells that synthesize a r polypeptide. 
Alternatively, one or more T cells that proliferate in the presence of ar protein can be 
5 expanded in number by cloning. Methods for cloning cells are well known in the art, and 
include limiting dilution. 

IX. PHARMACEUTICAL COMPOSITIONS 

In additional embodiments, the present invention concerns formulation of one 
or more of the polynucleotide, polypeptide, T-cell and/or antibody compositions disclosed 

10 herein in pharmaceutically-acceptable solutions for administration to a cell or an animal, 
either alone, or in combination with one or more other modalities of therapy. 

It will also be understood that, if desired, the nucleic acid segment, RNA, 
DNA or PNA compositions that express a polypeptide as disclosed herein may be 
administered in combination with other agents as well, such as, e.g., other proteins or 

1 5 polypeptides or various pharmaceutically-active agents. In fact, there is virtually no limit to 
other components that may also be included, given that the additional agents do not cause a 
significant adverse effect upon contact with the target cells or host tissues. The compositions 
may thus be delivered along with various other agents as required in the particular instance. 
Such compositions may be purified from host cells or other biological sources, or 

20 alternatively may be chemically synthesized as described herein. Likewise, such 

compositions may further comprise substituted or derivatized RNA or DNA compositions. 

Formulation of pharmaceutically-acceptable excipients and carrier solutions is 
well-known to those of skill in the art, as is the development of suitable dosing and treatment 
regimens for using the particular compositions described herein in a variety of treatment 

25 regimens, including e.g., oral, parenteral, intravenous, intranasal, and intramuscular 
administration and formulation, 

A. Oral Delivery 

In certain applications, the pharmaceutical compositions disclosed herein may 
be delivered via oral administration to an animal. As such, these compositions may be 
30 formulated with an inert diluent or with an assimilable edible carrier, or they may be enclosed 
in hard- or soft-shell gelatin capsule, or they may be compressed into tablets, or they may be 
incorporated directly with the food of the diet. 
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The active compounds may even be incorporated with excipients and used in 
the form of ingestible tablets, buccal tables, troches, capsules, elixirs, suspensions, syrups, 
wafers, and the like (Mathiowitz et al. 9 1997; Hwang et ah, 1998; U. S. Patent 5,641,515; U. 
S. Patent 5,580,579 and U. S. Patent 5,792,451, each specifically incorporated herein by 
5 reference in its entirety). The tablets, troches, pills, capsules and the like may also contain the 
following: a binder, as gum tragacanth, acacia, cornstarch, or gelatin; excipients, such as 
dicalcium phosphate; a disintegrating agent, such as corn starch, potato starch, alginic acid 
and the like; a lubricant, such as magnesium stearate; and a sweetening agent, such as 
sucrose, lactose or saccharin may be added or a flavoring agent, such as peppermint, oil of 

1 0 wintergreen, or cherry flavoring. When the dosage unit form is a capsule, it may contain, in 
addition to materials of the above type, a liquid carrier. Various other materials may be 
present as coatings or to otherwise modify the physical form of the dosage unit. For instance, 
tablets, pills, or capsules may be coated with shellac, sugar, or both. A syrup of elixir may 
contain the active compound sucrose as a sweetening agent methyl and propylparabens as 

15 preservatives, a dye and flavoring, such as cherry or orange flavor. Of course, any material 
used in preparing any dosage unit form should be pharmaceutically pure and substantially 
non-toxic in the amounts employed. In addition, the active compounds may be incorporated 
into sustained-release preparation and formulations. 

Typically, these formulations may contain at least about 0.1% of the active 

20 compound or more, although the percentage of the active ingredient(s) may, of course, be 

varied and may conveniently be between about 1 or 2% and about 60% or 70% or more of the 
weight or volume of the total formulation. Naturally, the amount of active compound(s) in 
each therapeutically useful composition may be prepared is such a way that a suitable dosage 
will be obtained in any given unit dose of the compound. Factors such as solubility, 

25 bioavailability, biological half-life, route of administration, product shelf life, as well as other 
pharmacological considerations will be contemplated by one skilled in the art of preparing 
such pharmaceutical formulations, and as such, a variety of dosages and treatment regimens 
may be desirable. 

For oral administration the compositions of the present invention may 
30 alternatively be incorporated with one or more excipients in the form of a mouthwash, 
dentifrice, buccal tablet, oral spray, or sublingual orally-administered formulation. For 
example, a mouthwash may be prepared incorporating the active ingredient in the required 
amount in an appropriate solvent, such as a sodium borate solution (Dobell's Solution). 
Alternatively, the active ingredient may be incorporated into an oral solution such as one 
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containing sodium borate, glycerin and potassium bicarbonate, or dispersed in a dentifrice, or 
added in a therapeutically-effective amount to a composition that may include water, binders, 
abrasives, flavoring agents, foaming agents, and humectants. Alternatively the compositions 
may be fashioned into a tablet or solution form that may be placed under the tongue or 
5 otherwise dissolved in the mouth. 

B. Injectable Delivery 

In certain circumstances it will be desirable to deliver the pharmaceutical 
compositions disclosed herein parenterally, intravenously, intramuscularly, or even 
intraperitoneally as described in U. S. Patent 5,543,158; IL S. Patent 5,641,515 and U. S. 

1 0 Patent 5,399,363 (each specifically incorporated herein by reference in its entirety). 

Solutions of the active compounds as free base or pharmacologically acceptable salts may be 
prepared in water suitably mixed with a surfactant, such as hydroxypropylcellulose. 
Dispersions may also be prepared in glycerol, liquid polyethylene glycols, and mixtures 
thereof and in oils. Under ordinary conditions of storage and use, these preparations contain 

15 a preservative to prevent the growth of microorganisms. 

The pharmaceutical forms suitable for injectable use include sterile aqueous 
solutions or dispersions and sterile powders for the extemporaneous preparation of sterile 
injectable solutions or dispersions (U. S. Patent 5,466,468, specifically incorporated herein by 
reference in its entirety). In all cases the form must be sterile and must be fluid to the extent 

20 that easy syringability exists. It must be stable under the conditions of manufacture and 

storage and must be preserved against the contaminating action of microorganisms, such as 
bacteria and fungi. The carrier can be a solvent or dispersion medium containing, for 
example, water, ethanol, polyol (e.g., glycerol, propylene glycol, and liquid polyethylene 
glycol, and the like), suitable mixtures thereof, and/or vegetable oils. Proper fluidity may be 

25 maintained, for example, by the use of a coating, such as lecithin, by the maintenance of the 
required particle size in the case of dispersion and by the use of surfactants. The prevention 
of the action of microorganisms can be facilitated by various antibacterial and antifungal 
agents, for example, parabens, chlorobutanol, phenol, sorbic acid, thimerosal, and the like. In 
many cases, it will be preferable to include isotonic agents, for example, sugars or sodium 

30 chloride. Prolonged absorption of the injectable compositions can be brought about by the 
use in the compositions of agents delaying absorption, for example, aluminum monostearate 
and gelatin. 
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For parenteral administration in an aqueous solution, for example, the solution 
should be suitably buffered if necessary and the liquid diluent first rendered isotonic with 
sufficient saline or glucose. These particular aqueous solutions are especially suitable for 
intravenous, intramuscular, subcutaneous and intraperitoneal administration. In this 
5 connection, a sterile aqueous medium that can be employed will be known to those of skill in 
the art in light of the present disclosure. For example, one dosage may be dissolved in 1 ml 
of isotonic NaCl solution and either added to 1000 ml of hypodermoclysis fluid or injected at 
the proposed site of infusion {see, e.g., Remington 's Pharmaceutical Sciences, 15th Edition, 
pp. 1035-1038 and 1570-1580). Some variation in dosage will necessarily occur depending 

10 on the condition of the subject being treated. The person responsible for administration will, 
in any event, determine the appropriate dose for the individual subject. Moreover, for human 
administration, preparations should meet sterility, pyrogenicity, and the general safety and 
purity standards as required by FDA Office of Biologies standards. 

Sterile injectable solutions are prepared by incorporating the active 

1 5 compounds in the required amount in the appropriate solvent with various of the other 
ingredients enumerated above, as required, followed by filtered sterilization. Generally, 
dispersions are prepared by incorporating the various sterilized active ingredients into a 
sterile vehicle which contains the basic dispersion medium and the required other ingredients 
from those enumerated above. In the case of sterile powders for the preparation of sterile 

20 injectable solutions, the preferred methods of preparation are vacuum-drying and freeze- 
drying techniques which yield a powder of the active ingredient plus any additional desired 
ingredient from a previously sterile-filtered solution thereof. 

The compositions disclosed herein may be formulated in a neutral or salt form. 
Pharmaceutically-acceptable salts, include the acid addition salts (formed with the free amino 

25 groups of the protein) and which are formed with inorganic acids such as, for example, 

hydrochloric or phosphoric acids, or such organic acids as acetic, oxalic, tartaric, mandelic, 
and the like. Salts formed with the free carboxyl groups can also be derived from inorganic 
bases such as, for example, sodium, potassium, ammonium, calcium, or ferric hydroxides, 
and such organic bases as isopropylamine, trimethylamine, histidine, procaine and the like. 

30 Upon formulation, solutions will be administered in a manner compatible with the dosage 
formulation and in such amount as is therapeutically effective. The formulations are easily 
administered in a variety of dosage forms such as injectable solutions, drug-release capsules, 
and the like. 
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As used herein, "carrier" includes any and all solvents, dispersion media, 
vehicles, coatings, diluents, antibacterial and antifungal agents, isotonic and absorption 
delaying agents, buffers, carrier solutions, suspensions, colloids, and the like. The use of 
such media and agents for pharmaceutical active substances is well known in the art. Except 
5 insofar as any conventional media or agent is incompatible with the active ingredient, its use 
in the therapeutic compositions is contemplated. Supplementary active ingredients can also 
be incorporated into the compositions. 

The phrase "pharmaceutically-acceptable" refers to molecular entities and 
compositions that do not produce an allergic or similar untoward reaction when administered 
10 to a human. The preparation of an aqueous composition that contains a protein as an active 
ingredient is well understood in the art. Typically, such compositions are prepared as 
injectables, either as liquid solutions or suspensions; solid forms suitable for solution in, or 
suspension in, liquid prior to injection can also be prepared. The preparation can also be 
emulsified. 

15 C Nasal Delivery 

In certain embodiments, the pharmaceutical compositions may be delivered by 
intranasal sprays, inhalation, and/or other aerosol delivery vehicles. Methods for delivering 
genes, nucleic acids, and peptide compositions directly to the lungs via nasal aerosol sprays 
has been described e.g., in U. S. Patent 5,756,353 and U. S. Patent 5,804,212 (each 

20 specifically incorporated herein by reference in its entirety). Likewise, the delivery of drugs 
using intranasal microparticle resins (Takenaga et aL, 1998) and lysophosphatidyl-glycerol 
compounds (U. S. Patent 5,725,871, specifically incorporated herein by reference in its 
entirety) are also well-known in the pharmaceutical arts. Likewise, transmucosal drug 
delivery in the form of a polytetrafluoroetheylene support matrix is described in U. S. Patent 

25 5,780,045 (specifically incorporated herein by reference in its entirety). 

D. Liposome-, Nanocapsule-, and Microparticle-Mediated Delivery 

In certain embodiments, the inventors contemplate the use of liposomes, 
nanocapsules, microparticles, microspheres, lipid particles, vesicles, and the like, for the 
introduction of the compositions of the present invention into suitable host cells. In 
30 particular, the compositions of the present invention may be formulated for delivery either 
encapsulated in a lipid particle, a liposome, a vesicle, a nanosphere, or a nanoparticle or the 
like. 
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Such formulations may be preferred for the introduction of pharmaceutically- 
acceptable formulations of the nucleic acids or constructs disclosed herein. The formation 
and use of liposomes is generally known to those of skill in the art (see for example, 
Couvreur et al, 1977; Couvreur, 1988; Lasic, 1998; which describes the use of liposomes and 
5 nanocapsules in the targeted antibiotic therapy for intracellular bacterial infections and 

diseases). Recently, liposomes were developed with improved serum stability and circulation 
half-times (Gabizon & Papahadjopoulos, 1988; Allen and Choun, 1987; U. S. Patent 
5,741 ,5 1 6, specifically incorporated herein by reference in its entirety). Further, various 
methods of liposome and liposome like preparations as potential drug carriers have been 

10 reviewed (Takakura, 1998; Chandran et al 9 1997; Margalit, 1995; U. S. Patent 5,567,434; U, 
S. Patent 5,552,157; U. S. Patent 5,565,213; U. S. Patent 5,738,868 andU. S. Patent 
5,795,587, each specifically incorporated herein by reference in its entirety). 

Liposomes have been used successfully with a number of cell types that are 
normally resistant to transfection by other procedures including T cell suspensions, primary 

15 hepatocyte cultures and PC 12 cells (Renneisen et al, 1990; Muller et al, 1990). In addition, 
liposomes are free of the DNA length constraints that are typical of viral-based delivery 
systems. Liposomes have been used effectively to introduce genes, drugs (Heath & Martin, 
1986; Heath et a!. y 1986; Balazsovits et al 9 1989; Fresta & Puglisi, 1996), radiotherapeutic 
agents (Pikul et al. 9 1987), enzymes (Imaizumi et al 9 1990a; Imaizumi et al. 9 1990b), viruses 

20 (Faller & Baltimore, 1984), transcription factors and allosteric effectors (Nicolau & 
Gersonde, 1979) into a variety of cultured cell lines and animals. In addition, several 
successful clinical trails examining the effectiveness of liposome-mediated drug delivery 
have been completed (Lopez-Berestein et ah, 1985a; 1985b; Coune, 1988; Soulier et al, 
1988). Furthermore, several studies suggest that the use of liposomes is not associated with 

25 autoimmune responses, toxicity or gonadal localization after systemic delivery (Mori & 
Fukatsu, 1992). 

Liposomes are formed from phospholipids that are dispersed in an aqueous 
medium and spontaneously form multilamellar concentric bilayer vesicles (also termed 
multilamellar vesicles (MLVs). MLVs generally have diameters of from 25 nm to 4 |im. 
30 Sonication of MLVs results in the formation of small unilamellar vesicles (SUVs) with 
diameters in the range of 200 to 500 A, containing an aqueous solution in the core. 

Liposomes bear resemblance to cellular membranes and are contemplated for 
use in connection with the present invention as carriers for the peptide compositions. They 
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are widely suitable as both water- and lipid-soluble substances can be entrapped, i.e. in the 
aqueous spaces and within the bilayer itself, respectively. It is possible that the drug-bearing 
liposomes may even be employed for site-specific delivery of active agents by selectively 
modifying the liposomal formulation. 
5 In addition to the teachings of Couvreur et ah (1977; 1988), the following 

information may be utilized in generating liposomal formulations. Phospholipids can form a 
variety of structures other than liposomes when dispersed in water, depending on the molar 
ratio of lipid to water. At low ratios the liposome is the preferred structure. The physical 
characteristics of liposomes depend on pH, ionic strength and the presence of divalent 

10 cations. Liposomes can show low permeability to ionic and polar substances, but at elevated 
temperatures undergo a phase transition which markedly alters their permeability. The phase 
transition involves a change from a closely packed, ordered structure, known as the gel state, 
to a loosely packed, less-ordered structure, known as the fluid state. This occurs at a 
characteristic phase-transition temperature and results in an increase in permeability to ions, 

1 5 sugars and drugs. 

In addition to temperature, exposure to proteins can alter the permeability of 
liposomes. Certain soluble proteins, such as cytochrome c, bind, deform and penetrate the 
bilayer, thereby causing changes in permeability. Cholesterol inhibits this penetration of 
proteins, apparently by packing the phospholipids more tightly. It is contemplated that the 

20 most useful liposome formations for antibiotic and inhibitor delivery will contain cholesterol. 

The ability to trap solutes varies between different types of liposomes. For 
example, MLVs are moderately efficient at trapping solutes, but SUVs are extremely 
inefficient. SUVs offer the advantage of homogeneity and reproducibility in size distribution, 
however, and a compromise between size and trapping efficiency is offered by large 

25 unilamellar vesicles (LUVs). These are prepared by ether evaporation and are three to four 
times more efficient at solute entrapment than MLVs. 

In addition to liposome characteristics, an important determinant in entrapping 
compounds is the physicochemical properties of the compound itself. Polar compounds are 
trapped in the aqueous spaces and nonpolar compounds bind to the lipid bilayer of the 

30 vesicle. Polar compounds are released through permeation or when the bilayer is broken, but 
nonpolar compounds remain affiliated with the bilayer unless it is disrupted by temperature 
or exposure to lipoproteins. Both types show maximum efflux rates at the phase transition 
temperature. 
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Liposomes interact with cells via four different mechanisms: endocytosis by 
phagocytic cells of the reticuloendothelial system such as macrophages and neutrophils; 
adsorption to the cell surface, either by nonspecific weak hydrophobic or electrostatic forces, 
or by specific interactions with cell-surface components; fusion with the plasma cell 
5 membrane by insertion of the lipid bilayer of the liposome into the plasma membrane, with 
simultaneous release of liposomal contents into the cytoplasm; and by transfer of liposomal 
lipids to cellular or subcellular membranes, or vice versa, without any association of the 
liposome contents. It often is difficult to determine which mechanism is operative and more 
than one may operate at the same time. 

10 The fate and disposition of intravenously injected liposomes depend on their 

physical properties, such as size, fluidity, and surface charge. They may persist in tissues for 
h or days, depending on their composition, and half lives in the blood range from min to 
several h. Larger liposomes, such as MLVs and LUVs, are taken up rapidly by phagocytic 
cells of the reticuloendothelial system, but physiology of the circulatory system restrains the 

15 exit of such large species at most sites. They can exit only in places where large openings or 
pores exist in the capillary endothelium, such as the sinusoids of the liver or spleen, Thus, 
these organs are the predominate site of uptake. On the other hand, SUVs show a broader 
tissue distribution but still are sequestered highly in the liver and spleen. In general, this in 
vivo behavior limits the potential targeting of liposomes to only those organs and tissues 

20 accessible to their large size. These include the blood, liver, spleen, bone marrow, and 
lymphoid organs. 

Targeting is generally not a limitation in terms of the present invention. 
However, should specific targeting be desired, methods are available for this to be 
accomplished. Antibodies may be used to bind to the liposome surface and to direct the 

25 antibody and its drug contents to specific antigenic receptors located on a particular cell-type 
surface. Carbohydrate determinants (glycoprotein or glycolipid cell-surface components that 
play a role in cell-cell recognition, interaction and adhesion) may also be used as recognition 
sites as they have potential in directing liposomes to particular cell types. Mostly, it is 
contemplated that intravenous injection of liposomal preparations would be used, but other 

30 routes of administration are also conceivable. 

Alternatively, the invention provides for pharmaceutically-acceptable 
nanocapsule formulations of the compositions of the present invention. Nanocapsules can 
generally entrap compounds in a stable and reproducible way (Henry-Michelland et aL, 1987; 
Quintanar-Guerrero et aL 9 1998; Douglas et ah, 1987). To avoid side effects due to 
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intracellular polymeric overloading, such ultrafine particles (sized around 0.1 jam) should be 
designed using polymers able to be degraded in vivo. Biodegradable polyalkyl-cyanoacrylate 
nanoparticles that meet these requirements are contemplated for use in the present invention. 
Such particles may be are easily made, as described (Couvreur et al, 1980; 1988; zur Muhlen 
5 et al, 1998; Zambaux et al 1998; Pinto-Alphandry et al, 1995 and U. S. Patent 5,145,684, 
specifically incorporated herein by reference in its entirety). 

X. VACCINES 

In certain preferred embodiments of the present invention, vaccines are 
provided. The vaccines will generally comprise one or more pharmaceutical compositions, 

10 such as those discussed above, in combination with an immunostimulant An 

immunostimulant may be any substance that enhances or potentiates an immune response 
(antibody and/or cell-mediated) to an exogenous antigen. Examples of immunostimulants 
include adjuvants, biodegradable microspheres {e.g., polylactic galactide) and liposomes (into 
which the compound is incorporated; see, e.g., Fullerton, U.S. Patent No. 4,235,877). 

15 Vaccine preparation is generally described in, for example, Powell & Newman, eds., Vaccine 
Design (the subunit and adjuvant approach) (1995). Pharmaceutical compositions and 
vaccines within the scope of the present invention may also contain other compounds, which 
may be biologically active or inactive. For example, one or more immunogenic portions of 
other tumor antigens may be present, either incorporated into a fusion polypeptide or as a 

20 separate compound, within the composition or vaccine. 

Illustrative vaccines may contain DNA encoding one or more of the 
polypeptides as described above, such that the polypeptide is generated in situ. As noted 
above, the DNA may be present within any of a variety of delivery systems known to those of 
ordinary skill in the art, including nucleic acid expression systems, bacteria and viral 

25 expression systems. Numerous gene delivery techniques are well known in the art, such as 
those described by Rolland, Crit. Rev. Therap. Drug Carrier Systems 15:143-198 (1998), and 
references cited therein. Appropriate nucleic acid expression systems contain the necessary 
DNA sequences for expression in the patient (such as a suitable promoter and terminating 
signal). Bacterial delivery systems involve the administration of a bacterium (such as 

30 Bacillus-Calmette-Guerriri) that expresses an immunogenic portion of the polypeptide on its 
cell surface or secretes such an epitope. In a preferred embodiment, the DNA may be 
introduced using a viral expression system (e.g., vaccinia or other pox virus, retrovirus, or 
adenovirus), which may involve the use of a non-pathogenic (defective), replication 
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competent virus. Suitable systems are disclosed, for example, in Fisher-Hoch et al. 9 Proc. 
Natl. Acad. Sci. USA 86:317-321 (1989); Flexner etal,Ann. NY. Acad. Set 569:86-103 
(1989); Flexner etal 9 Vaccine 8:17-21 (1990); U.S. Patent Nos, 4,603,112, 4,769,330, and 
5,017,487; WO 89/01973; U.S. Patent No. 4,777,127; GB 2,200,651; EP 0,345,242; WO 
5 91/02805; Berkner, Biotechniques 6:616-627 (1988); Rosenfeld et al 9 Science 252:431-434 
(1991); Kolls et al 9 Proc. Natl Acad. Set USA 91:215-219 (1994); Kass-Eisler et aL, Proc. 
Natl. Acad. Sci. USA 90:11498-11502 (1993); Guzman etal 9 Circulation 88:2838-2848 
(1993); and Guzman et al 9 Cir. Res. 73:1202-1207 (1993). Techniques for incorporating 
DNA into such expression systems are well known to those of ordinary skill in the art. The 

10 DNA may also be "naked," as described, for example, in Ulmer et al 9 Science 259:1745- 
1749 (1993) and reviewed by Cohen, Science 259:1691-1692 (1993). The uptake of naked 
DNA may be increased by coating the DNA onto biodegradable beads, which are efficiently 
transported into the cells. It will be apparent that a vaccine may comprise both a 
polynucleotide and a polypeptide component. Such vaccines may provide for an enhanced 

1 5 immune response. 

It will be apparent that a vaccine may contain pharmaceutically acceptable 
salts of the polynucleotides and polypeptides provided herein. Such salts may be prepared 
from pharmaceutically acceptable non-toxic bases, including organic bases {e.g., salts of 
primary, secondary and tertiary amines and basic amino acids) and inorganic bases {e.g., 

20 sodium, potassium, lithium, ammonium, calcium and magnesium salts). 

While any suitable carrier known to those of ordinary skill in the art may be 
employed in the vaccine compositions of this invention, the type of carrier will vary 
depending on the mode of administration. Compositions of the present invention may be 
formulated for any appropriate manner of administration, including for example, topical, oral, 

25 nasal, intravenous, intracranial, intraperitoneal, subcutaneous or intramuscular administration. 
For parenteral administration, such as subcutaneous injection, the carrier preferably 
comprises water, saline, alcohol, a fat, a wax or a buffer. For oral administration, any of the 
above carriers or a solid carrier, such as mannitol, lactose, starch, magnesium stearate, 
sodium saccharine, talcum, cellulose, glucose, sucrose, and magnesium carbonate, may be 

30 employed. Biodegradable microspheres {e.g., polylactate polyglycolate) may also be 
employed as carriers for the pharmaceutical compositions of this invention. Suitable 
biodegradable microspheres are disclosed, for example, in U.S. Patent Nos. 4,897,268; 
5,075,109; 5,928,647; 5,811,128; 5,820,883; 5,853,763; 5,814,344 and 5,942,252. One may 
also employ a carrier comprising the particulate-protein complexes described in U.S. Patent 
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No. 5,928,647, which are capable of inducing a class I-restricted cytotoxic T lymphocyte 
responses in a host. 

Such compositions may also comprise buffers {e.g., neutral buffered saline or 
phosphate buffered saline), carbohydrates (e.g., glucose, mannose, sucrose or dextrans), 
5 mannitol, proteins, polypeptides or amino acids such as glycine, antioxidants, bacteriostats, 
chelating agents such as EDTA or glutathione, adjuvants (e.g., aluminum hydroxide), solutes 
that render the formulation isotonic, hypotonic or weakly hypertonic with the blood of a 
recipient, suspending agents, thickening agents and/or preservatives. Alternatively, 
compositions of the present invention may be formulated as a lyophilizate. Compounds may 

10 also be encapsulated within liposomes using well known technology. 

Any of a variety of immunostimulants may be employed in the vaccines of 
this invention. For example, an adjuvant may be included. Most adjuvants contain a 
substance designed to protect the antigen from rapid catabolism, such as aluminum hydroxide 
or mineral oil, and a stimulator of immune responses, such as lipid A, Bortadella pertussis or 

1 5 Mycobacterium species or Mycobacterium derived proteins. For example, delipidated, 

deglycolipidated M. vaccae ("pVac") can be used. In another embodiment, BCG is used. In 
addition, the vaccine can be administered to a subject previously exposed to BCG. Suitable 
adjuvants are commercially available as, for example, Freund's Incomplete Adjuvant and 
Complete Adjuvant (Difco Laboratories, Detroit, MI); Merck Adjuvant 65 (Merck and 

20 Company, Inc., Rahway, NJ); AS-2 and derivatives thereof (SmithKline Beecham, 

Philadelphia, PA); CWS, TDM, Leif, aluminum salts such as aluminum hydroxide gel (alum) 
or aluminum phosphate; salts of calcium, iron or zinc; an insoluble suspension of acylated 
tyrosine; acylated sugars; cationically or anionically derivatized polysaccharides; 
polyphosphazenes; biodegradable microspheres; monophosphoryl lipid A and quil A. 

25 Cytokines, such as GM-CSF or interleukin-2, -7, or -12, may also be used as adjuvants. 

Within the vaccines provided herein, the adjuvant composition is preferably 
designed to induce an immune response predominantly of the Thl type. High levels of Thl - 
type cytokines (e.g., IFN-y, TNFoc, IL-2 and IL-12) tend to favor the induction of cell 
mediated immune responses to an administered antigen. In contrast, high levels of Th2-type 

30 cytokines (e.g., IL-4, IL-5, IL-6 and IL-10) tend to favor the induction of humoral immune 
responses. Following application of a vaccine as provided herein, a patient will support an 
immune response that includes Thl- and Th2-type responses. Within a preferred 
embodiment, in which a response is predominantly Thl -type, the level of Thl -type cytokines 
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will increase to a greater extent than the level of Th2-type cytokines. The levels of these 
cytokines may be readily assessed using standard assays. For a review of the families of 
cytokines, see Mosmann & Coffinan, Ann. Rev. Immunol. 7:145-173 (1989). 

Preferred adjuvants for use in eliciting a predominantly Thl-type response 
5 include, for example, a combination of monophosphoryl lipid A, preferably 3-de-O-acylated 
monophosphoryl lipid A (3D-MPL), together with an aluminum salt. MPL adjuvants are 
available from Corixa Corporation (Seattle, WA; see US Patent Nos. 4,436,727; 4,877,611; 
4,866,034 and 4,912,094). CpG-containing oligonucleotides (in which the CpG dinucleotide 
is unmethylated) also induce a predominantly Thl response. Such oligonucleotides are well 

10 known and are described, for example, in WO 96/02555, WO 99/33488 and U.S. Patent Nos. 
6,008,200 and 5,856,462. Immunostimulatory DNA sequences are also described, for 
example, by Sato et al, Science 273:352 (1996). Another preferred adjuvant comprises a 
saponin, such as Quil A, or derivatives thereof, including QS21 and QS7 (Aquila 
Biopharmaceuticals Inc., Framingham, MA); Escin; Digitonin; or Gypsophila or 

15 Chenopodium quinoa saponins . Other preferred formulations include more than one saponin 
in the adjuvant combinations of the present invention, for example combinations of at least 
two of the following group comprising QS21, QS7, Quil A, (3-escin, or digitonin. 

Alternatively the saponin formulations may be combined with vaccine vehicles 
composed of chitosan or other polycationic polymers, polylactide and polylactide-co- 

20 glycolide particles, poly-N-acetyl glucosamine-^ ased polymer matrix, particles composed of 
polysaccharides or chemically modified polysaccharides, liposomes and lipid-based particles, 
particles composed of glycerol monoesters, etc. The saponins may also be formulated in the 
presence of cholesterol to form particulate structures such as liposomes or ISCOMs. 
Furthermore, the saponins may be formulated together with a polyoxyethylene ether or ester, 

25 in either a non-particulate solution or suspension, or in a particulate structure such as a 

paucilamelar liposome or ISCOM. The saponins may also be formulated with excipients such 
as Carbopol R to increase viscosity, or may be formulated in a dry powder form with a powder 
excipient such as lactose. 

In one preferred embodiment, the adjuvant system includes the combination of 

30 a monophosphoryl lipid A and a saponin derivative, such as the combination of QS21 and 
3D-MPL® adjuvant, as described in WO 94/00153, or a less reactogenic composition where 
the QS21 is quenched with cholesterol, as described in WO 96/33739. Other preferred 
formulations comprise an oil-in-water emulsion and tocopherol. Another particularly 
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preferred adjuvant formulation employing QS21, 3D-MPL adjuvant and tocopherol in an 

oil-in-water emulsion is described in WO 95/17210. 

Another enhanced adjuvant system involves the combination of a CpG- 

containing oligonucleotide and a saponin derivative particularly the combination of CpG and 
5 QS21 as disclosed in WO 00/09159. Preferably the formulation additionally comprises an oil 

in water emulsion and tocopherol. 

Other preferred adjuvants include Montanide ISA 720 (Seppic, France), SAF 

(Chiron, California, United States), ISCOMS (CSL), MF-59 (Chiron), the SBAS series of 

adjuvants (e.g., SBAS-2, AS2\ AS2," SBAS-4, or SBAS6, available from SmithKline 
10 Beecham, Rixensart, Belgium), Detox (Corixa, Hamilton, MT), RC-529 (Corixa, Hamilton, 

MT) and other aminoalkyl glucosatninide 4-phosphates (AGPs), such as those described in 

pending U.S. Patent Application Serial Nos. 08/853,826 and 09/074,720, the disclosures of 

which are incorporated herein by reference in their entireties, and polyoxyethylene ether 

adjuvants such as those described in WO 99/52549A1. 
1 5 Other preferred adjuvants include adjuvant molecules of the general formula 

(I): HO(CH 2 CH 2 0) n -A-R, 

wherein, n is 1-50, A is a bond or -C(O)-, R is Ci_ 5 o alkyl or Phenyl d-so alkyl. 

One embodiment of the present invention consists of a vaccine formulation 
comprising a polyoxyethylene ether of general formula (I), wherein n is between 1 and 50, 

20 preferably 4-24, most preferably 9; the R component is Ci- 50; preferably C4-C20 alkyl and 
most preferably Cu alkyl, and A is a bond. The concentration of the polyoxyethylene ethers 
should be in the range 0.1-20%, preferably from 0.1-10%, and most preferably in the range 
0.1-1%. Preferred polyoxyethylene ethers are selected from the following group; 
polyoxyethylene-9-lauryl ether, polyoxyethylene-9-steoryl ether, polyoxyethylene-8-steoryl 

25 ether, polyoxyethylene-4-lauryl ether, polyoxyethylene-35-lauryl ether, and polyoxyethylene- 
23-lauryl ether. Polyoxyethylene ethers such as polyoxyethylene lauryl ether are described in 
the Merck index (12 th edition: entry 7717). These adjuvant molecules are described in WO 
99/52549. 

The polyoxyethylene ether according to the general formula (I) above may, if 
30 desired, be combined with another adjuvant. For example, a preferred adjuvant combination 
is preferably with CpG as described in the pending UK patent application GB 9820956.2. 

Any vaccine provided herein may be prepared using well known methods that 
result in a combination of antigen, immune response enhancer and a suitable carrier or 
excipient. The compositions described herein may be administered as part of a sustained 
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release formulation {i.e., a formulation such as a capsule, sponge or gel (composed of 
polysaccharides, for example) that effects a slow release of compound following 
administration). Such formulations may generally be prepared using well known technology 
{see, e.g., Coombes etaL, Vaccine 14:1429-1438 (1996)) and administered by, for example, 
oral, rectal or subcutaneous implantation, or by implantation at the desired target site. 
Sustained-release formulations may contain a polypeptide, polynucleotide or antibody 
dispersed in a carrier matrix and/or contained within a reservoir surrounded by a rate 
controlling membrane. 

Carriers for use within such formulations are biocompatible, and may also be 
biodegradable; preferably the formulation provides a relatively constant level of active 
component release. Such carriers include microparticles of poly(lactide-co-glycolide), 
polyacrylate, latex, starch, cellulose, dextran and the like. Other delayed-release carriers 
include supramolecular biovectors, which comprise a non-liquid hydrophilic core (e.g., a 
cross-linked polysaccharide or oligosaccharide) and, optionally, an external layer comprising 
an amphiphilic compound, such as a phospholipid (see, e.g., U.S. Patent No. 5,151,254 and 
PCT applications WO 94/20078, WO/94/23701 and WO 96/06638). The amount of active 
compound contained within a sustained release formulation depends upon the site of 
implantation, the rate and expected duration of release and the nature of the condition to be 
treated or prevented. 

Any of a variety of delivery vehicles may be employed within pharmaceutical 
compositions and vaccines to facilitate production of an antigen-specific immune response 
that targets tumor cells. Delivery vehicles include antigen presenting cells (APCs), such as 
dendritic cells, macrophages, B cells, monocytes and other cells that may be engineered to be 
efficient APCs. Such cells may, but need not, be genetically modified to increase the 
capacity for presenting the antigen, to improve activation and/or maintenance of the T cell 
response, to have anti-tumor effects per se and/or to be immunologically compatible with the 
receiver (i.e., matched HLA haplotype). APCs may generally be isolated from any of a 
variety of biological fluids and organs, including tumor and peritumoral tissues, and may be 
autologous, allogeneic, syngeneic or xenogeneic cells. 

Certain preferred embodiments of the present invention use dendritic cells or 
progenitors thereof as antigen-presenting cells. Dendritic cells are highly potent APCs 
(Banchereau& Steinman, Nature 392:245-251 (1998)) and have been shown to be effective 
as a physiological adjuvant for eliciting prophylactic or therapeutic antitumor immunity (see 
Timmerman & Levy, Ann. Rev. Med. 50:507-529 (1999)). In general, dendritic cells may be 
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identified based on their typical shape (stellate in situ, with marked cytoplasmic processes 
(dendrites) visible in vitro), their ability to take up, process and present antigens with high 
efficiency and their ability to activate naive T cell responses. Dendritic cells may, of course, 
be engineered to express specific cell-surface receptors or ligands that are not commonly 
found on dendritic cells in vivo or ex vivo, and such modified dendritic cells are contemplated 
by the present invention. As an alternative to dendritic cells, secreted vesicles antigen-loaded 
dendritic cells (called exosomes) may be used within a vaccine (see Zitvogel et al, Nature 
Med. 4:594-600 (1998)). 

Dendritic cells and progenitors may be obtained from peripheral blood, bone 
marrow, tumor-infiltrating cells, peritumoral tissues-infiltrating cells, lymph nodes, spleen, 
skin, umbilical cord blood or any other suitable tissue or fluid. For example, dendritic cells 
may be differentiated ex vivo by adding a combination of cytokines such as GM-CSF, IL-4, 
IL-13 and/or TNFa to cultures of monocytes harvested from peripheral blood. Alternatively, 
CD34 positive cells harvested from peripheral blood, umbilical cord blood or bone marrow 
may be differentiated into dendritic cells by adding to the culture medium combinations of 
GM-CSF, IL-3, TNFa, CD40 ligand, LPS, flt3 ligand and/or other compound(s) that induce 
differentiation, maturation and proliferation of dendritic cells. 

Dendritic cells are conveniently categorized as "immature" and "mature" cells, 
which allows a simple way to discriminate between two well characterized phenotypes. 
However, this nomenclature should not be construed to exclude all possible intermediate 
stages of differentiation. Immature dendritic cells are characterized as APC with a high 
capacity for antigen uptake and processing, which correlates with the high expression of Fey 
receptor and mannose receptor. The mature phenotype is typically characterized by a lower 
expression of these markers, but a high expression of cell surface molecules responsible for T 
cell activation such as class I and class II MHC, adhesion molecules (e.g., CD54 and CD1 1) 
and costimulatory molecules (e.g., CD40, CD80, CD86 and 4-1BB). 

APCs may generally be transfected with a polynucleotide encoding a protein 
(or portion or other variant thereof) such that the polypeptide, or an immunogenic portion 
thereof, is expressed on the cell surface. Such transfection may take place ex vivo, and a 
composition or vaccine comprising such transfected cells may then be used for therapeutic 
purposes, as described herein. Alternatively, a gene delivery vehicle that targets a dendritic 
or other antigen presenting cell may be administered to a patient, resulting in transfection that 
occurs in vivo. In vivo and ex vivo transfection of dendritic cells, for example, may generally 
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be performed using any methods known in the art, such as those described in WO 97/24447, 
or the gene gun approach described by Mahvi et al , Immunology and Cell Biology 75:456- 
460 (1997). Antigen loading of dendritic cells may be achieved by incubating dendritic cells 
or progenitor cells with the polypeptide, DNA (naked or within a plasmid vector) or RNA; or 
with antigen-expressing recombinant bacterium or viruses (e.g., vaccinia, fowlpox, 
adenovirus or lentivirus vectors). Prior to loading, the polypeptide may be covalently 
conjugated to an immunological partner that provides T cell help (e.g., a carrier molecule). 
Alternatively, a dendritic cell may be pulsed with a non-conjugated immunological partner, 
separately or in the presence of the polypeptide. 

Vaccines and pharmaceutical compositions may be presented in unit-dose or 
multi-dose containers, such as sealed ampoules or vials. Such containers are preferably 
hermetically sealed to preserve sterility of the formulation until use. In general, formulations 
may be stored as suspensions, solutions or emulsions in oily or aqueous vehicles. 
Alternatively, a vaccine or pharmaceutical composition may be stored in a freeze-dried 
condition requiring only the addition of a sterile liquid carrier immediately prior to use. 

XI. DIAGNOSTIC KITS 

The present invention further provides kits for use within any of the above 
diagnostic methods. Such kits typically comprise two or more components necessary for 
performing a diagnostic assay. Components may be compounds, reagents, containers and/or 
equipment. For example, one container within a kit may contain a monoclonal antibody or 
fragment thereof that specifically binds to a protein. Such antibodies or fragments may be 
provided attached to a support material, as described above. One or more additional 
containers may enclose elements, such as reagents or buffers, to be used in the assay. Such 
kits may also, or alternatively, contain a detection reagent as described above that contains a 
reporter group suitable for direct or indirect detection of antibody binding. 

Alternatively, a kit may be designed to detect the level of mRNA encoding a 
protein in a biological sample. Such kits generally comprise at least one oligonucleotide 
probe or primer, as described above, that hybridizes to a polynucleotide encoding a protein. 
Such an oligonucleotide may be used, for example, within a PCR or hybridization assay. 
Additional components that may be present within such kits include a second oligonucleotide 
and/or a diagnostic reagent or container to facilitate the detection of a polynucleotide 
encoding a protein of the invention. 
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All publications and patent applications cited in this specification are herein 
incorporated by reference as if each individual publication or patent application were 
specifically and individually indicated to be incorporated by reference. 

Although the foregoing invention has been described in some detail by way of 
illustration and example for purposes of clarity of understanding, it will be readily apparent to 
one of ordinary skill in the art in light of the teachings of this invention that certain changes 
and modifications may be made thereto without departing from the spirit or scope of the 
appended claims. 

XII. EXAMPLES 

The following examples are provided by way of illustration only and not by 
way of limitation. Those of skill in the art will readily recognize a variety of noncritical 
parameters that could be changed or modified to yield essentially similar results. 

Example 1: Recombinant Fusion Proteins of M. tuberculosis Antigens Exhibit Increased 
Serological Sensitivity 

A. Materials and Methods 

1 . Construction of vectors encoding fusion prot eins: TbF14 

TbF14 is a fusion protein of the amino acid sequence encoding the MTb81 
antigen fused to the amino acid sequence encoding the Mo2 antigen. A sequence encoding 
Mo2 was PCR amplified with the following primers: PDM-294 (T m 64°C) 
CGTAATCACGTGCAGAAGTACGGCGGATC (SEQ ID NO:14)and PDM-295 (T m 63°C) 
CCGACTAGAATTCACTATTGACAGGCCCATC (SEQ ID NO: 15). 

DNA amplification was performed using 10 ul 10X Pfu buffer, 1 ul 10 mM 
dNTPs, 2 ul each of the PCR primers at 10 uM concentration, 83 ul water, 1.5 ul Pfu DNA 
polymerase (Stratagene, La Jolla, CA) and 50 ng DNA template. For Mo2 antigen, 
denaturation at 96°C was performed for 2 min; followed by 40 cycles of 96°C for 20 sec, 
63°C for 15 sec and 72°C for 2.5 min; and finally by 72°C for 5 min. 

A sequence encoding MTb81 was PCR amplified with the following primers: 
PDM-268 (T m 66°C) CTAAGTAGTACTGATCGCGTGTCGGTGGGC (SEQ ID NO: 16) 
and PDM-296 (T m 64°C) CATCGATAGGCCTGGCCGCATCGTCACC (SEQ ID NO:17). 
The amplification reaction was performed using the same mix as above, as follows: 
denaturation at 96°C for 2 min; followed by 40 cycles of 96°C for 20 sec, 65°C for 15 sec, 
72°C for 5 min; and finally by 72°C for 5 min. 



72 



The Mo2 PCR product was digested with Eco72I (Stratagene, La Jolla CA) 
and EcoRI (NEB, Beverly, MA). The MTb81 PCR product was digested with Fsel and Still 
(NEB, Beverly, MA). These two products were then cloned into an expression plasmid (a 
modified pET28 vector) with a hexahistidine in frame, in a three way ligation that was 
digested with Fsel and EcoRI. The sequences was confirmed, then the expression plasmid 
was transformed into the BL21pLysE E. coli strain (Novagen, Madison, WI) for expression 
of the recombinant protein. 

2. Construction of vectors encoding fusion proteins: TbF15 
TbF15 is a fusion of antigens Ra3, 38 kD (with an N-terminal cysteine), 38-1, 
and FL TbH4 from Mycobacterium tuberculosis, as was prepared a follows. TbFl 5 was 
made using the fusion constructs TbF6 and TbFlO. 

TbF6 was made as follows (see PCT/US99/03268 and PCT/US99/03265). 
First, the FL (full-length) TbH4 coding region was PCR amplified with the following 
primers:PDM-157 CTAGTTAGTACTCAGTCGCAGACCGTG (SEQ ID NO:18) (T m 61°C) 
and PDM-160 GCAGTGACGAATTCACTTCGACTCC (SEQ ID NO: 19) (T m 59°C), using 
the following conditions: 10 ul 10X Pfu buffer, lul 10 mM dNTPs, 2 ul 10 uM each oligo, 
82 ul sterile water, 1.5 ul Accuzyme (ISC, Kaysville, UT), 200 ng Mycobacterium 
tuberculosis genomic DNA. Denaturation at 96°C was performed for 2 minutes; followed by 
40 cycles of 96°C for 20 seconds, 61°C 15 seconds, and 72°C 5 minutes; and finally by 72°C 
10 minutes. 

The PCR product was digested with Seal and EcoRI and cloned into 
pET28Ra3/3 8kD/3 8- 1 A, described below, which was digested with Dral and EcoRI. 

pET28Ra3/3 8kD/3 8- 1 A was made by inserting a Dral site at the end of 38-1 
before the stop codon using the following conditions. The 38-1 coding region was PCR 
amplified with the following primers: PDM-69 

GGATCCAGCGCTGAGATGAAGACCGATGCCGCT (SEQ ID NO: 19) (T m 68°C) and 
PDM-83 GGATATCTGCAGAATTCAGGTTTAAAGCCCATTTGCGA (SEQ ID NO:20) 
(T m 64°C), using the following conditions: 10 ul 10X Pfu buffer, 1 ul 10 mM dNTPs, 2ul 10 
uM each oligo, 82ul sterile water, 1.5 ul Accuzyme (ISC, Kaysville, UT), 50 ng plasmid 
DNA. Denaturation at 96°C was performed for 2 minutes; followed by forty cycles of 96°C 
for 20 seconds, 66°C for 15 seconds and 72°C for 1 minute 10 seconds; and finally 72°C 4 
minutes. 
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The 38-1 PCR product was digested with Eco47III and EcoRI and cloned into 
the P T7AL2Ra3/38kD construct (described in WO/9816646 and WO/9816645) which was 
digested with EcoRI and Eco47III. The correct construct was confirmed through sequence 
analysis. The Ra3/38kD/38-l A coding region was then subcloned into pET28 His (a 
modified pET28 vector) at the Ndel and EcoRI sites. The correct construct (called TbF6) 
was confirmed through sequence analysis. 

Fusion construct TbFlO, which replaces the N-terminal cysteine of 38 kD, was 
made as follows. To replace the cysteine residue at the N-terminus, the 38kD-38-l coding 
region from the TbF fusion (described in WO/9816646 and WO/9816645) was amplified 
using the following primers: PDM-192 TGTGGCTCGAAACCACCGAGCGGTTC (SEQ ID 
NO:21) (T m 64°C) andPDM-60 GAGAGAATTCTCAGAAGCCCATTTGCGAGGACA 
(SEQ ID NO:22) (T m 64°C), using the following conditions: 10 ul 10X Pfu buffer, 1 nl 10 
mM dNTPs, 2 ul 10 uM each oligo, 83 ul sterile water, 1.5 ul Pfu DNA polymerase 
(Stratagene, La Jolla, CA), and 50 ng plasmid TbF DNA. The amplification reaction was 
performed as follows: 96°C for 2 minutes; followed by 40 cycles of 96°C for 20 seconds, 
64°C 15 seconds, and 72°C 4 minutes; and finally 72°C 4 minutes. Digest the PCR product 
with Eco RI and clone into pT7AL2Ra3 which has been digested with Stu I and Eco RI. 
Digest the resulting construct with Nde I and EcoRI and clone into pET28 at those sites. The 
resulting clone (called TbF 10) will be TBF + a cysteine at the 5' end of the 38kD coding 
region. Transform into BL21 and HMS 174 with pLys S. 

The pET28TbF6 (TbF6, described above) construct was digested with StuI 
(NEB, Beverly, MA) and EcoRI, which released a 1.76 kb insert containing the very back 
portion of the 38 kD/38-l/FL TbH4 fusion region. This insert was gel purified. The 
pET28TbF10 construct (TbF 10, described above) was digested with the same enzymes and 
the vector backbone, consisting of 6.45 kb containing the his-tag, the Ra3 coding region and 
most of the A38kD coding region. This insert was gel purified. The insert and vector were 
ligated and transformed. The correct construct, called TbF15, was confirmed through 
sequence analysis, then transformed into the BL21 pLysS E. coli strain (Novagen, Madison 
WI). This fusion protein contained the original Cys at the amino terminus of the 38 kD 
protein. 
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B. Expression of fusion proteins 

1 . Expression of fusion proteins 

The recombinant proteins were expressed in E. coli with six histidine residues 
at the amino-terminal portion using the pET plasmid vector and a T7 RNA polymerase 
expression system (Novagen, Madison, WI). E. coli strain BL21 (DE3) pLysE (Novagen) 
was used for high level expression. The recombinant (His-Tag) fusion proteins were purified 
from the soluble supernatant or the insoluble inclusion body of 1 L of IPTG induced batch 
cultures by affinity chromatography using the one step QIAexpress Ni-NTA Agarose matrix 
(QIAGEN, Chatsworth, CA) in the presence of 8M urea. 

Briefly, 20 ml of an overnight saturated culture of BL21 containing the pET 
construct was added into 1 L of 2x YT media containing 30 (ig/ml kanamycin and 34 jig/ml 
chloramphenicol, grown at 37°C with shaking. The bacterial cultures were induced with 1 
mM IPTG at an OD 560 of 0.3 and grown for an additional 3 h (OD - 1.3 to 1.9). Cells were 
harvested from 1 L batch cultures by centrifugation and resuspended in 20 ml of binding 
buffer (0.1 M sodium phosphate, pH 8.0; 10 mM Tris-HCl, pH 8.0) containing 2 mM PMSF 
and 20 [ig/ml leupeptin plus one complete protease inhibitor tablet (Boehringer Mannheim) 
per 25 ml. E. coli was lysed by freeze-thaw followed by brief sonication, then spun at 12 k 
rpm for 30 min to pellet the inclusion bodies. 

The inclusion bodies were washed three times in 1% CHAPS in 10 mM Tris- 
HCl (pH 8.0). This step greatly reduced the level of contaminating LPS. The inclusion body 
was finally solubilized in 20 ml of binding buffer containing 8 M urea or 8M urea was added 
directly into the soluble supernatant. Recombinant fusion proteins with His-Tag residues 
were batch bound to Ni-NTA agarose resin (5 ml resin per 1 L inductions) by rocking at 
room temperature for 1 h and the complex passed over a column. The flow through was 
passed twice over the same column and the column washed three times with 30 ml each of 
wash buffer (0.1 M sodium phosphate and 10 mM Tris-HCl, pH 6.3) also containing 8 M 
urea. Bound protein was eluted with 30 ml of 150 mM imidazole in wash buffer and 5 ml 
fractions collected. Fractions containing each recombinant fusion protein were pooled, 
dialyzed against 10 mM Tris-HCl (pH 8.0) bound one more time to the Ni-NTA matrix, 
eluted and dialyzed in 10 mM Tris-HCl (pH 7.8). The yield of recombinant protein varies 
from 25-150 mg per liter of induced bacterial culture with greater than 98% purity. 
Recombinant proteins were assayed for endotoxin contamination using the Limulus assay 
(BioWhittaker) and were shown to contain < 100 E.U./mg. 
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2. Serological assays 

ELISA assays were performed with TbF15 using methods known to those of 
skill in the art, with 200 ng/well of antigen. ELISA assays are performed with TbF14 using 
methods known to those of skill in the art, with 200 ng/well of antigen. 

3. Results 

The TbF15 fusion protein containing TbRa3, 38kD (with N terminal cysteine), 
Tb38-1, and full length (FL) TbH4 as described above was used as the solid phase antigen in 
ELISA. The ELISA protocol is as described above. The fusion recombinant was coated at 
200 ng/well. A panel of sera were chosen from a group of TB patients that had previously 
been shown by ELISA to be positive or borderline positive with these antigens. Such a panel 
enabled the direct comparison of the fusions with and without the cysteine residue in the 38 
kD component. The data are outlined in Figure 5. A total of 23 TB sera were studied and of 
these 20/23 were detected by TbF6 versus 22/23 for TbF15. Improvements in reactivity were 
seen in the low reactive samples when TbF15 was used. 

One of skill in the art will appreciate that the order of the individual antigens 
within each fusion protein may be changed and that comparable activity would be expected 
provided that each of the epitopes is still functionally available. In addition, truncated forms 
of the proteins containing active epitopes may be used in the construction of fusion proteins. 

Example 2: Cloning, construction, and expression of HTCC#1 full-length, overlapping 
halves, and deletions as fusion constructs 

HTCC#1 (aka MTb40) was cloned by direct T cell expression screening using 
a T cell line derived from a healthy PPD positive donor to directly screen an E. coli based 
MTb expression library. 

A. Construction and screening of the plasmid expression library 

Genomic DNA from M. tuberculosis Erdman strain was randomly sheared to 
an average size of 2 kb and blunt ended with Klenow polymerase, before EcoRI adaptors 
were added. The insert was subsequently ligated into the 1 screen phage vector and packaged 
in vitro using the PhageMaker extract (Novagen). The phage library (Erd 1 screen) was 
amplified and a portion was converted into a plasmid expression library. Conversion from 
phage to plasmid (phagemid) library was performed as follows: the Erd 1 Screen phage 
library was converted into a plasmid library by autosubcloning using the E. coli host strain 
BM25.8 as suggested by the manufacturer (Novagen). Plasmid DNA was purified from 
BM25.8 cultures containing the pSCREEN recombinants and used to transform competent 
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cells of the expressing host strain BL21(DE3)pLysS. Transformed cells were aliquoted into 
96 well micro titer plates with each well containing a pool size of -50 colonies. Replica 
plates of the 96 well plasmid library format were induced with IPTG to allow recombinant 
protein expression. Following induction, the plates were centrifuged to pellet the E. coli and 
the bacterial pellet was resuspended in 200 ul of IX PBS. 

Autologous dendritic cells were subsequently fed with the E. coli, washed and 
exposed to specific T cell lines in the presence of antibiotics to inhibit the bacterial growth. T 
cell recognition was detected by proliferation and/or production of IFN-y. Wells that score 
positive were then broken down using the same protocol until a single clone could be 
detected. The gene was then sequenced, sub-cloned, expressed and the recombinant protein 
evaluated. 

B. Expression in E. coli of the full-length and overlapping constructs of 
HTCC#1 

One of the identified positive wells was further broken down until a single 
reactive clone (HTCC#1) was identified. Sequencing of the DNA insert followed by search 
of the Genebank database revealed a 100% identity to sequences within the M. tuberculosis 
locus MTCY7H7B (gene identification MTCY07H7B.06) located on region B of the cosmid 
clone SCY07H7. The entire open reading frame is -1,200 bp long and codes for a 40 kDa 
(392 amino acids) protein (Fig. 1; HTCC#1 FL). Oligonucleotide PCR primers [5'(5'-CAA 
TTA CAT ATG CAT CAC CAT CAC CAT CAC ATGAGCAGA GCG TTCATCA TC-3 ') 
and 3' (5'-CAT GGA ATT CGC CGT TAG ACG ACG TTT CGT A-3')] were designed to 
amplify the full-length sequence of HTCC#1 from genomic DNA of the virulent Erdman 
strain. 

The 5' oligonucleotide contained an Nde / restriction site preceding the ATG 
initiation codon (underlined) followed by nucleotide sequences encoding six histidines (bold) 
and sequences derived from the gene (italic). The resultant PCR products was digested with 
Ndel and EcoRI and subcloned into the pET17b vector similarly digested with Ndel and 
EcoRI. Ligation products were initially transformed into E. coli XLl-Blue competent cells 
(Stratagene, La Jolla, CA) and were subsequently transformed into E. coli BL-21 (pLysiE) 
host cells (Novagen, Madison, WI) for expression. 

C. Expression of the full length and overlapping constructs of HTCC#1 

Several attempts to express the full-length sequence of HTCC#1 m.E. coli 
failed either because no transformants could be obtained or because the E. coli host cells wen 
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lysed following IPTG induction. HTCC#1 is 392 amino acids long and has 3 trans- 
membrane (TM) domains which are presumably responsible for the lysing of the E. coli 
culture following IPTG induction. 

Thus expression of HTCC#1 was initially attempted by constructing two 
5 overlapping fragments coding for the amino (residues 1-223; Fig. 2a) and carboxy (residues 
184-392; Fig. 2b) halves. 

The N-terminal (residues 1-223) fragment containing the first of the 3 putative 
transmembrane domains killed (lysed) the host cells, while the C-terminal (residues 184-392) 
half expressed at high levels in the same host cell. Thus the two trans-membrane domains 
10 located in the C-terminal half do not appear to be toxic. 

The N-terminal fragment, comprising amino acid residues 1-128 (devoid of 
the transmembrane domain), was therefore engineered for expression in the same pET17b 
vector system (Fig. 2c). This construct expressed quite well and there was no toxicity 
associated with the expressing E. coli host. 

15 D. Expression in E. coli of the full-length HTCC#1 as an TbRal2 fusion 

construct 

Because of problems associated with the expression of full length HTCC#1, 
we evaluated the utility of an TbRal2 fusion construct for the generation of a fusion protein 
that would allow for the stable expression of recombinant HTCC#1. 

20 pET17b vector (Novagen) was modified to include TbRal2, a 14 kDa C- 

terminal fragment of the serine protease antigen MTB32A of Mycobacterium tuberculosis 
(Skeiky et aL). For use as an expression vector, the 3' stop codon of the TbRal2 was 
substituted with an in frame EcoRI site and the N-terminal end was engineered so as to code 
for six His-tag residues immediately following the initiator Met. This would facilitate a 

25 simple one step purification protocol of TbRal2 recombinant proteins by affinity 
chromatography over Ni-NTA matrix. 

Specifically, the C-terminal fragment of antigen MTB32A was amplified by 
standard PCR methods using the oligonucleotide primers 5'(CAA TTA CAT ATG CAT 
CAC CAT CAC CAT CAC ACG GCC GCG TCC GAT AAC TTC and 3 ' (5'-CTA ATC 

30 GAA TTC GGC CGG GGG TCC CTC GGC CAA). The 450 bp product was digested with 
Ndel and EcoRI and cloned into the pET17b expression vector similarly digested with the 
same enzymes. 

Recombinant HTCC#1 was engineered for expression as a fusion protein with 
TbRal2 by designing oligonucleotide primers to specifically amplify the full length form. 
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The 5' oligonucleotide contained a thrombin recognition site. The resulting PCR amplified 
product was digested with EcoRI and subcloned into the EcoRI site of the pET-TbRal2 
vector. Following transformation into the E. coli host strain (XLl-blue; Stratagene), clones 
containing the correct size insert were submitted for sequencing in order to identify those that 
5 are in frame with the TbRal2 fusion. Subsequently, the DNA of interest (Fig. 3) was 
transformed into the BL-21 (pLysE) bacterial host and the fusion protein was expressed 
following induction of the culture with IPTG. 

E. Expression in E. coli of HTCC#1 with deletions of the trans-membrane 
domain(s) 

10 Given the prediction that the 3 predicted trans-membrane (TM) domains are 

responsible for lysing the E. coli host following IPTG induction, recombinant constructs 
lacking the TM domains were engineered for expression in E. coli. 

1. Recombinant HTCC#1 with deletion of the first TM (ATM-1Y A deletion 
construct lacking the first trans-membrane domain (amino acid residues 150-160) was 

15 engineered for expression E. coli (Fig. 4a). This construct expressed reasonably well and 
enough (low mg quantities) was purified for in vitro studies. This recombinant antigen was 
comparable in in vitro assays to that of the full-length Ra-12-fusion construct. 

T-cell epitope mapping of HTCC#L Because of the generally low level of 
expression using the ATM-1 construct, the design of the final form of HTCC#1 for 

20 expression in E. coli was based on epitope mapping. The T-cell epitope was mapped using 
30 overlapping peptides (Fig. 4b) on PBMC read out (on four PPD+ donors). The data 
revealed that peptides 8 through 16 (amino acid residues 92-215) were not immunogenic 
(Fig. 4c). 

2. Recombinant HTCC#1 with deletion of all of the TM domains (ATM-2): 
25 A deletion construct of HTCC#1 lacking residues 101 to 203 with a predicted molecular 

weight of 30.4 kDa was engineered for expression in E. coli. The full length HTCC#1 is 40 
kDa. There was no toxicity associated with this new deletion construct and the expression 
level was higher than that of the ATM-1 construct (Fig. 4d). 

F. Fusion constructs of HTCC#1 and TbH9: 

30 Fig. 5 shows a sequence of HTCC#1 (184-392)-TbH9-HTCC#l (1-129) 

Fig. 6 shows a sequence of HTCC#1 (l-149)-TbH9-HTCC#l (161-392) 
Fig. 7 shows a sequence of HTCC#1 (184-392)-TbH9-HTCC#l (1-200) 
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One of skill in the art will appreciate that the order of the individual antigens 
within each fusion protein may be changed and that comparable activity would be expected 
provided that each of the epitopes is still functionally available. In addition, truncated forms 
of the proteins containing active epitopes may be used in the construction of fusion proteins. 

From the foregoing, it will be appreciated that, although specific embodiments 
of the invention have been described herein for the purpose of illustration, various 
modifications may be made without deviating from the spirit and scope of the invention. 
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WHAT IS CLAIMED IS: 

1 1 . A pharmaceutical composition comprising an MTb8 1 antigen or an 

2 immunogenic fragment thereof from a Mycobacterium species of the tuberculosis complex, 

3 and an Mo2 antigen or an immunogenic fragment thereof from a Mycobacterium species of 

4 the tuberculosis complex. 

1 2. The composition of claim 1, wherein the antigens are covalently 

2 linked, thereby forming a fusion polypeptide. 

1 3. The composition of claim 2, wherein the fusion polypeptide has the 

2 amino acid sequence of TbF14. 

1 4. A pharmaceutical composition comprising a TbRa3 antigen or an 

2 immunogenic fragment thereof from a Mycobacterium species of the tuberculosis complex, a 

3 38kD antigen or an immunogenic fragment thereof from a Mycobacterium species of the 

4 tuberculosis complex, a Tb38-1 antigen or an immunogenic fragment thereof from a 

5 Mycobacterium species of the tuberculosis complex, and a FL TbH4 antigen or an 

6 immunogenic fragment thereof from a Mycobacterium species of the tuberculosis complex. 

1 5. The composition of claim 4, wherein the antigens are covalently 

2 linked, thereby forming a fusion polypeptide. 

1 6. The composition of claim 5, wherein the fusion polypeptide has the 

2 amino acid sequence of TbF 1 5 . 

1 7. A pharmaceutical composition comprising an HTCC#1 antigen or an 

2 immunogenic fragment thereof from & Mycobacterium species of the tuberculosis complex, 

3 and a TbH9 antigen or an immunogenic fragment thereof from a Mycobacterium species of 

4 the tuberculosis complex. 

1 8. The composition of claim 7, wherein the antigens are covalently 

2 linked, thereby forming a fusion polypeptide. 

1 9. The composition of claim 7, comprising a full-length HTCC#1 antigen 

2 from a Mycobacterium species of the tuberculosis complex, and a full-length TbH9 antigen 

3 from a Mycobacterium species of the tuberculosis complex. 
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1 10. The composition of claim 9, wherein the antigens are covalently 

2 linked, thereby forming a fusion polypeptide. 

1 11. The composition of claim 1 G, wherein the fusion polypeptide has the 

2 amino acid sequence of HTCC#1 (FL)-TbH9(FL). 

1 12. The composition of claim 7, comprising a polypeptide comprising 

2 amino acids 184-392 of an HTCC#1 antigen from a Mycobacterium species of the 

3 tuberculosis complex, a TbH9 antigen or an immunogenic fragment thereof from a 

4 Mycobacterium species of the tuberculosis complex, and a polypeptide comprising amino 

5 acids 1-129 of an HTCC#1 antigen from a Mycobacterium species of the tuberculosis 

6 complex. 

1 13. The composition of claim 12, wherein the antigens are covalently 

2 linked, thereby forming a fusion polypeptide. 

1 14. The composition of claim 13, wherein the fusion polypeptide has the 

2 amino acid sequence of HTCC#l(184-392)/TbH9/HTCC#l(l-129). 

1 1 5 . A pharmaceutical composition comprising a TbRal2 antigen or an 

2 immunogenic fragment thereof from a Mycobacterium species of the tuberculosis complex, 

3 and an HTCC#1 antigen or an immunogenic fragment thereof from a Mycobacterium species 

4 of the tuberculosis complex. 

1 16. The composition of claim 15, wherein the antigens are covalently 

2 linked, thereby forming a fusion polypeptide. 

1 17. The composition of claim 16, wherein the fusion polypeptide has the 

2 amino acid sequence of TbRa 1 2-HTCC# 1 . 

1 18. A pharmaceutical composition comprising at least two heterologous 

2 antigens from a Mycobacterium species of the tuberculosis complex or an immunogenic 

3 fragment thereof, wherein the antigen or immungenic fragment thereof is selected from the 

4 group consisting of MTb81, Mo2, TbRa3, 38kD, Tb38-1 (MTbll), FL TbH4, HTCC#1 

5 (Mtb40), TbH9, MTCC#2 (Mtb41), DPEP, DPPD, TbRa35, TbRal2, MTb59, MTb82, Erdl4 
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6 (Mtbl6), FL TbRa35 (Mtb32A), DPV (Mtb8.4), MSL (Mtb9.8), MTI (Mtb9.9A, also known 

7 as MTI- A), ESAT-6, a-crystalline, and 85 complex. 



1 19. The composition of claim 18, wherein the antigens are covalently 

2 linked, thereby forming a fusion polypeptide. 

1 20. The composition of claim 1,4,1, 15, or 18, wherein the antigens are 

2 covalently linked via a chemical linker. 

1 21. The composition of claim 20, wherein the chemical linker is an amino 

2 acid linker. 

1 22. The composition of claim 1, 4, 7, 15, or 18, further comprising at least 

2 one additional antigen from a Mycobacterium species of the tuberculosis complex, wherein 



3 the antigen is selected from the group consisting of MTb81, Mo2, TbRa3, 38kD, Tb38-1 

4 (MTbl 1), FL TbH4, HTCC#1 (Mtb40), TbH9, MTCC#2 (Mtb41), DPEP, DPPD, TbRa35, 

5 TbRal2, MTb59 5 MTb82, Erdl4 (Mtbl6), FL TbRa35 (Mtb32A), DPV (Mtb8.4), MSL 

6 (Mtb9.8), MTI (Mtb9.9A, also known as MTI-A), ESAT-6, a-crystalline, and 85 complex, or 



7 an immunogenic fragment thereof. 

1 23. The composition of claim 1, 4, 7, 15, or 18, further comprising an 

2 adjuvant. 

1 24. The composition of claim 23, wherein the adjuvant comprises QS21 

2 and MPL. 

1 25. The composition of claim 23, wherein the adjuvant is selected from the 

2 group consisting of AS2, ENHANZYN, MPL, QS21, CWS, TDM, AGP, CPG, Leif, saponin, 

3 and saponin mimetics. 

1 26. The composition of claim 1, 4, 7, 15, or 18, further comprising BCG. 

1 27. The composition of claim 1 , 4, 7, 1 5, or 1 8, further comprising an NS 1 

2 antigen or an immunogenic fragment thereof from a Mycobacterium species of the 

3 tuberculosis complex. 

1 28. The composition of claim 1, 4, 7, 15, or 18, wherein the 

2 Mycobacterium species is Mycobacterium tuberculosis. 
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1 29. An expression cassette comprising a nucleic acid encoding an MTb8 1 

2 antigen or an immunogenic fragment thereof from a Mycobacterium species of the 

3 tuberculosis complex, and a nucleic acid encoding an Mo2 antigen or an immunogenic 

4 fragment thereof from a Mycobacterium species of the tuberculosis complex. 

1 30. The expression cassette of claim 29, wherein the nucleic acid encodes 

2 a fusion polypeptide comprising an MTb81 antigen or an immunogenic fragment thereof and 

3 a nucleic acid encoding an Mo2 antigen or an immunogenic fragment thereof. 

1 31. The expression cassette of claim 30, wherein the nucleic acid encodes 

2 a fusion polypeptide having the amino acid sequence of TbF14. 

1 32. The expression cassette of claim 3 1, wherein the nucleic acid has the 

2 nucleotide sequence of the nucleic acid encoding TbFl 4. 

1 33. An expression cassette comprising a nucleic acid encoding a TbRa3 

2 antigen or an immunogenic fragment thereof from a Mycobacterium species of the 

3 tuberculosis complex, a nucleic acid encoding a 38kD antigen or an immunogenic fragment 

4 thereof from a Mycobacterium species of the tuberculosis complex, a nucleic acid encoding a 

5 Tb38-1 antigen or an immunogenic fragment thereof from a Mycobacterium species of the 

6 tuberculosis complex, and a nucleic acid encoding a FL TbH4 antigen or an immunogenic 

7 fragment thereof from a Mycobacterium species of the tuberculosis complex. 

1 34. The expression cassette of claim 33, wherein the nucleic acid encodes 

2 a fusion polypeptide comprising a TbRa3 antigen or an immunogenic fragment thereof, a 

3 38kD antigen or an immunogenic fragment thereof, a Tb38-1 antigen or an immunogenic 

4 fragment thereof, and a nucleic acid encoding a FL TbH4 antigen or an immunogenic 

5 fragment thereof 

1 35. The expression cassette of claim 34, wherein the nucleic acid encodes 

2 a fusion polypeptide having the amino acid sequence of TbFl 5. 

1 36. The expression cassette of claim 35, wherein the nucleic acid has the 

2 nucleotide sequence of the nucleic acid encoding TbFl 5. 
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1 37. An expression cassette comprising a nucleic acid encoding an 

2 HTCC#1 antigen or an immunogenic fragment thereof from a Mycobacterium species of the 

3 tuberculosis complex, and a nucleic acid encoding a TbH9 antigen or an immunogenic 

4 fragment thereof from a Mycobacterium species of the tuberculosis complex. 

1 38. The expression cassette of claim 37, comprising a nucleic acid 

2 encoding a full-length HTCC#1 antigen from a Mycobacterium species of the tuberculosis 

3 complex, and a nucleic acid encoding a full-length TbH9 antigen from a Mycobacterium 

4 species of the tuberculosis complex. 

1 39. The expression cassette of claim 37, comprising a nucleic acid 

2 encoding a polypeptide comprising amino acids 184-392 of an HTCC#1 antigen from a 

3 Mycobacterium species of the tuberculosis complex, a nucleic acid encoding a TbH9 antigen 

4 or an immunogenic fragment thereof from a Mycobacterium species of the tuberculosis 

5 complex, and a nucleic acid encoding a polypeptide comprising amino acids 1-129 of an 

6 HTCC#1 antigen from a Mycobacterium species of the tuberculosis complex, 

1 40, The expression cassette of claim 37, wherein the nucleic acid encodes 

2 a fusion polypeptide comprising an HTCC#1 antigen or an immunogenic fragment thereof, 

3 and a TbH9 antigen or an immunogenic fragment thereof. 

1 41. The expression cassette of claim 38, wherein the nucleic acid encodes 

2 a fusion polypeptide comprising a full-length HTCC#1 antigen, and a full-length TbH9 

3 antigen. 

1 42. The expression cassette of claim 39, wherein the nucleic acid encodes 

2 a fusion polypeptide comprising amino acids 184-392 of an HTCC#1, a TbH9 antigen or an 

3 immunogenic fragment thereof, and amino acids 1-129 of an HTCC#1 antigen. 

1 43. The expression cassette of claim 41, wherein the nucleic acid encodes 

2 a fusion polypeptide having the amino acid sequence of HTCC#l(FL)-TbH9(FL). 

1 44. The expression cassette of claim 43, wherein the nucleic acid has the 

2 nucleotide sequence of the nucleic acid encoding HTCC#l(FL)-TbH9(FL). 
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1 45. The expression cassette of claim 42, wherein the nucleic acid encodes 

2 a fusion polypeptide having the amino acid sequence of HTCC#1 (1 84- 

3 3 92)/TbH9/HTCC# 1(1-129). 

1 46. The expression cassette of claim 45, wherein the nucleic acid has the 

2 nucleotide sequence of the nucleic acid encoding HTCC#l(184-392)/TbH9/HTCC#l(l-129). 

1 47. An expression cassette comprising a nucleic acid encoding a TbRal2 

2 antigen or an immunogenic fragment thereof from & Mycobacterium species of the 

3 tuberculosis complex, and a nucleic acid encoding an HTCC#1 antigen or an immunogenic 

4 fragment thereof from a Mycobacterium species of the tuberculosis complex. 

1 48. The expression cassette of claim 47, wherein the nucleic acid encodes 

2 a fusion polypeptide comprising an Ral2 antigen or an immunogenic fragment thereof, and 

3 an HTCC#1 antigen or an immunogenic fragment thereof. 

1 49. The expression cassette of claim 48, wherein the nucleic acid encodes 

2 a fusion polypeptide having the amino acid sequence of TbRal2-HTCC#l . 

1 50. The expression cassette of claim 49, wherein the nucleic acid has the 

2 nucleotide sequence of the nucleic acid encoding TbRal2-HTCC#l. 

1 51. An expression cassette comprising a nucleic acid encoding at least two 

2 heterologous antigens from a Mycobacterium species of the tuberculosis complex or an 

3 immunogenic fragment thereof, wherein the antigen or immungenic fragment thereof is 



4 selected from the group consisting of MTb81, Mo2, TbRa3, 38kD, Tb38-1 (MTbll), FL 

5 TbH4, HTCC#1 (Mtb40), TbH9, MTCC#2 (Mtb41) ? DPEP, DPPD, TbRa35, TbRal2, 

6 MTb59, MTb82, Erdl4 (Mtbl6), FL TbRa35 (Mtb32A), DPV (Mtb8.4), MSL (Mtb9.8), MTI 

7 (Mtb9.9A, also known as MTI-A), ESAT-6, a-crystalline, and 85 complex. 



1 52. The expression cassette of claim 51, wherein the nucleic acid encodes 

2 a fusion polypeptide. 

1 53. The expression cassette of claim 29, 33, 37, 47 or 51, further 

2 comprising a nucleic acid encoding at least one additional antigen from a Mycobacterium 

3 species of the tuberculosis complex, wherein the antigen is selected from the group consisting 
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4 of MTb81, Mo2, TbRa3, 38kD, Tb38-1 (MTbl 1), FL TbH4, HTCC#1 (Mtb40), TbH9, 

5 MTCC#2 (Mtb41), DPEP, DPPD, TbRa35, TbRal2, MTb59, MTb82, Erdl4 (Mtbl6), FL 

6 TbRa35 (Mtb32A), DPV (Mtb8.4), MSL (Mtb9.8), MTI, ESAT-6, a-crystalline, and 85 

7 complex, or an immunogenic fragment thereof. 

1 54. The expression cassette of claim 29, 33, 37, 47 or 51, further 

2 comprising a nucleic acid encoding an NS1 antigen or an antigenic fragment thereof from a 

3 Mycobacterium species of the tuberculosis complex. 

1 55. The expression cassette of claim 29, 33, 37, 47 or 5 1 , wherein the 

2 Mycobacterium species is Mycobacterium tuberculosis. 

1 56. A method for eliciting an immune response in a mammal, the method 

2 comprising the step of administering to the mammal an immunologically effective amount of 

3 a pharmaceutical composition comprising an MTb8 1 antigen or an immunogenic fragment 

4 thereof from a Mycobacterium species of the tuberculosis complex, and an Mo2 antigen or an 

5 immunogenic fragment thereof from a Mycobacterium species of the tuberculosis complex. 

1 57. The method of claim 56, wherein the antigens are covalently linked, 

2 thereby forming a fusion polypeptide. 

1 58. The method of claim 57, wherein the fusion polypeptide has the amino 

2 acid sequence of TbF 14. 

1 59. A method for eliciting an immune response in a mammal, the method 

2 comprising the step of administering to the mammal an immunologically effective amount of 

3 a pharmaceutical composition comprising a TbRa3 antigen or an immunogenic fragment 

4 thereof from a, Mycobacterium species of the tuberculosis complex, a 38kD antigen or an 

5 immunogenic fragment thereof from a Mycobacterium species of the tuberculosis complex, a 

6 Tb38-1 antigen or an immunogenic fragment thereof from a Mycobacterium species of the 

7 tuberculosis complex, and a FL TbH4 antigen or an immunogenic fragment thereof from a 

8 Mycobacterium species of the tuberculosis complex. 

1 60. The method of claim 59, wherein the antigens are covalently linked, 

2 thereby forming a fusion polypeptide. 
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1 6 1 . The method of claim 60, wherein the fusion polypeptide has the amino 

2 acid sequence of TbF 1 5 . 

1 62. A method for eliciting an immune response in a mammal, the method 

2 comprising the step of administering to the mammal an immunologically effective amount of 

3 a pharmaceutical composition comprising an HTCC#1 antigen or an immunogenic fragment 

4 thereof from a Mycobacterium species of the tuberculosis complex, and a TbH9 antigen or an 

5 immunogenic fragment thereof from a Mycobacterium species of the tuberculosis complex. 

1 63 . The method of claim 62, wherein the pharmaceutical composition 

2 comprises a full-length HTCC#1 antigen from a Mycobacterium species of the tuberculosis 

3 complex, and a full-length TbH9 antigen from a Mycobacterium species of the tuberculosis 

4 complex. 

1 64. The method of claim 63, wherein the antigens are covalently linked, 

2 thereby forming a fusion polypeptide. 

1 65 . The method of claim 64, wherein the fusion polypeptide has the amino 

2 acid sequence of HTCC#l(FL)-TbH9(FL). 

1 66. The method of claim 62, wherein the pharmaceutical composition 

2 comprises a polypeptide comprising amino acids 184-392 of an HTCC#1 antigen from a 

3 Mycobacterium species of the tuberculosis complex, a TbH9 antigen or an immunogenic 

4 fragment thereof from a Mycobacterium species of the tuberculosis complex, and a 

5 polypeptide comprising amino acids 1-129 of an HTCC#1 antigen from a Mycobacterium 

6 species of the tuberculosis complex. 

1 67. The method of claim 66, wherein the antigens are covalently linked, 

2 thereby forming a fusion polypeptide. 

1 68 . The method of claim 67, wherein the fusion polypeptide has the amino 

2 acid sequence of HTCC#l(184-392)/TbH9/HTCC#l(l-129). 

1 69. A method for eliciting an immune response in a mammal, the method 

2 comprising the step of administering to the mammal an immunologically effective amount of 

3 a pharmaceutical composition comprising a TbRal2 antigen or an immunogenic fragment 
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4 thereof from a Mycobacterium species of the tuberculosis complex, and an HTCC#1 antigen 

5 or an immunogenic fragment thereof from a Mycobacterium species of the tuberculosis 

6 complex. 

1 70. The method of claim 69, wherein the antigens are covalently linked, 

2 thereby forming a fusion polypeptide. 

1 71 . The method of claim 70, wherein the fusion polypeptide has the amino 

2 acid sequence of TbRal2-HTCC#l. 

1 72. A method for eliciting an immune response in a mammal, the method 

2 comprising the step of administering to the mammal an immunologically effective amount of 

3 a pharmaceutical composition comprising at least two heterologous antigens from a 

4 Mycobacterium species of the tuberculosis complex or an immunogenic fragment thereof, 

5 wherein the antigen or immungenic fragment thereof is selected from the group consisting of 



6 MTb81, Mo2, TbRa3, 38kD, Tb38-1 (MTbl 1), FL TbH4, HTCC#1 (Mtb40), TbH9, 

7 MTCC#2 (Mtb41), DPEP, DPPD, TbRa35, TbRal2, MTb59, MTb82, Erdl4 (Mtbl6), FL 

8 TbRa35 (Mtb32A), DPV (Mtb8.4), MSL (Mtb9.8), MTI (Mtb9.9A, also known as MTI-A), 

9 ESAT-6, a-crystalline, and 85 complex. 



1 73. The method of claim 72, wherein the antigens are covalently linked, 

2 thereby forming a fusion protein. 

1 74. The method of claim 56, 59, 62, 69, or 72, wherein the mammal has 

2 been immunized with BCG. 

1 75. The method of claim 56, 59, 62, 69, or 72, wherein the mammal is a 

2 human. 

1 76. The method of claim 56, 59, 62, 69, or 72, wherein the composition is 

2 administered prophylactically. 

1 77. The method of claim 56, 59, 62, 69, or 72, wherein the pharmaceutical 

2 composition further comprises an adjuvant. 

1 78. The method of claim 77, wherein the adjuvant comprises QS21 and 

2 MPL. 
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1 79. The method of claim 77, wherein the adjuvant is selected from the 

2 group consisting of AS2, ENHANZYN, MPL, QS21, CWS, TDM, AGP, CPG, Leif, saponin, 

3 and saponin mimetics. 

1 80. A method for eliciting an immune response in a mammal, the method 

2 comprising the step of administering to the mammal an immunologically effective amount of 

3 an expression cassette comprising a nucleic acid encoding an MTb81 antigen or an 

4 immunogenic fragment thereof from a Mycobacterium species of the tuberculosis complex, 

5 and a nucleic acid encoding an Mo2 antigen or an immunogenic fragment thereof from a 

6 Mycobacterium species of the tuberculosis complex. 

1 81 . The method of claim 80, wherein the nucleic acid encodes a fusion 

2 polypeptide comprising an MTb81 antigen or an immunogenic fragment thereof, and an Mo2 

3 antigen or an immunogenic fragment thereof. 

1 82. The method of claim 81, wherein the nucleic acid encodes a fusion 

2 polypeptide having the amino acid sequence of TbF14. 

1 83. The method of claim 82, wherein the nucleic acid has the nucleotide 

2 sequence of the nucleic acid encoding TbF14. 

1 84. A method for eliciting an immune response in a mammal, the method 

2 comprising the step of administering to the mammal an immunologically effective amount of 

3 an expression cassette comprising a nucleic acid encoding a TbRa3 antigen or an 

4 immunogenic fragment thereof from a Mycobacterium species of the tuberculosis complex, a 

5 nucleic acid encoding a 38kD antigen or an immunogenic fragment thereof from a 

6 Mycobacterium species of the tuberculosis complex, a nucleic acid encoding a Tb38-1 

7 antigen or an immunogenic fragment thereof from a Mycobacterium species of the 

8 tuberculosis complex, and a nucleic acid encoding a FL TbH4 antigen or an immunogenic 

9 fragment thereof from a Mycobacterium species of the tuberculosis complex. 

1 85 . The method of claim 84, wherein the nucleic acid encodes a fusion 

2 polypeptide comprising a TbRa3 antigen or an immunogenic fragment thereof, a 38kD 

3 antigen or an immunogenic fragment thereof, a Tb38-1 antigen or an immunogenic fragment 

4 thereof, and a FL TbH4 antigen or an immunogenic fragment thereof 
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1 86. The method of claim 85, wherein the nucleic acid encodes a fusion 

2 polypeptide having the amino acid sequence of TbF 1 5 . 

1 87. The method of claim 86, wherein the nucleic acid has the nucleotide 

2 sequence of the nucleic acid encoding TbF 15. 

1 88. A method for eliciting an immune response in a mammal, the method 

2 comprising the step of administering to the mammal an immunologically effective amount of 

3 an expression cassette comprising a nucleic acid encoding an HTCC#1 antigen or an 

4 immunogenic fragment thereof from a Mycobacterium species of the tuberculosis complex, 

5 and a nucleic acid encoding a TbH9 antigen or an immunogenic fragment thereof from a 

6 Mycobacterium species of the tuberculosis complex. 

1 89. The method of claim 88, wherein the nucleic acid encodes a fusion 

2 polypeptide comprising an HTCC#1 antigen or an immunogenic fragment thereof, and a 

3 TbH9 antigen or an immunogenic fragment thereof 

1 90. The method of claim 89, wherein the nucleic acid encodes a fusion 

2 polypeptide comprising a full-length HTCC#1 antigen or an immunogenic fragment thereof, 

3 and a full-length TbH9 antigen or an immunogenic fragment thereof. 

1 91. The method of claim 90, wherein the nucleic acid encodes a fusion 

2 polypeptide having the amino acid sequence of HTCC# 1 (FL)-TbH9(FL) . 

1 92. The method of claim 91, wherein the nucleic acid has the nucleotide 

2 sequence of the nucleic acid encoding HTCC#l(FL)-TbH9(FL). 

1 93 . The method of claim 89, wherein the nucleic acid encodes a fusion 

2 polypeptide comprising a polypeptide comprising amino acids 1 84-392 of an HTCC#1 

3 antigen, a TbH9 antigen or an immunogenic fragment thereof, and a polypeptide comprising 

4 amino acids 1-129 of an HTCC#1 antigen. 

1 94. The method of claim 93, wherein the nucleic acid encodes a fusion 

2 polypeptide having the amino acid sequence of HTCC#l(184-392)/TbH9/HTCC#l(l-129). 

1 95. The method of claim 93, wherein the nucleic acid has the nucleotide 

2 sequence of the nucleic acid encoding HTCC#l(184-392)/TbH9/HTCC#l(l-129). 
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1 96. A method for eliciting an immune response in a mammal, the method 

2 comprising the step of administering to the mammal an immunologically effective amount of 

3 an expression cassette comprising a nucleic acid encoding a TbRal2 antigen or an 

4 immunogenic fragment thereof from a Mycobacterium species of the tuberculosis complex, 

5 and a nucleic acid encoding an HTCC#1 antigen or an immunogenic fragment thereof from a 

6 Mycobacterium species of the tuberculosis complex. 

1 97. The method of claim 96, wherein the nucleic acid encodes a fusion 

2 polypeptide comprising a TbRal2 antigen or an immunogenic fragment thereof, and an 

3 HTCC#1 antigen or an immunogenic fragment thereof. 

1 98. The method of claim 97, wherein the nucleic acid encodes a fusion 

2 polypeptide having the amino acid sequence of TbRal 2-HTCC# 1 . 

1 99. The method of claim 98, wherein the nucleic acid has the nucleotide 

2 sequence of the nucleic acid encoding TbRal 2-HTCC# 1 . 

1 100. A method for eliciting an immune response in a mammal, the method 



2 comprising the step of administering to the mammal an immunologically effective amount of 

3 an expression cassette comprising a nucleic acid encoding at least two heterologous antigens 

4 from a Mycobacterium species of the tuberculosis complex or an immunogenic fragment 

5 thereof, wherein the antigen or immungenic fragment thereof is selected from the group 

6 consisting of MTb81, Mo2, TbRa3, 38kD, Tb38-1 (MTbl 1), FL TbH4, HTCC#1 (Mtb40), 

7 TbH9, MTCC#2 (Mtb41), DPEP, DPPD, TbRa35, TbRal2, MTb59, MTb82, Erdl4 (Mtbl6), 

8 FL TbRa35 (Mtb32A), DPV (Mtb8.4), MSL (Mtb9.8), MTI (Mtb9.9A, also known as MTI- 

9 A), ESAT-6, a-crystalline, and 85 complex. 



1 101. The method of claim 1 00, wherein the nucleic acid encodes a fusion 

2 polypeptide. 

1 102. The method of claim 80, 84, 88, 96, or 100, wherein the mammal has 

2 been immunized with BCG. 

1 103. The method of claim 80, 84, 88, 96, or 100, wherein the mammal is a 

2 human. 
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104. The method of claim 80, 84, 88, 96, or 100, wherein the composition is 
administered prophylactically. 



1 1 05 . A fusion protein comprising an MTb8 1 antigen or an immunogenic 

2 fragment thereof from a Mycobacterium species of the tuberculosis complex, and an Mo2 

3 antigen or an immunogenic fragment thereof from a Mycobacterium species of the 

4 tuberculosis complex. 

1 106. The protein of claim 105, wherein the fusion polypeptide has the 

2 amino acid sequence of TbF14. 

1 1 07. A fusion protein comprising a TbRa3 antigen or an immunogenic 



2 fragment thereof from a Mycobacterium species of the tuberculosis complex, a 38kD antigen 

3 or an immunogenic fragment thereof from a Mycobacterium species of the tuberculosis 

4 complex, a Tb38-1 antigen or an immunogenic fragment thereof from a Mycobacterium 

5 species of the tuberculosis complex, and a FL TbH4 antigen or an immunogenic fragment 

6 thereof from a Mycobacterium species of the tuberculosis complex. 



1 108. The protein of claim 1 07, wherein the fusion polypeptide has the 

2 amino acid sequence of TbF 1 5 . 

1 1 09 . A fusion protein comprising an HTCC# 1 antigen or an immunogenic 

2 fragment thereof from a Mycobacterium species of the tuberculosis complex, and a TbH9 

3 antigen or an immunogenic fragment thereof from a Mycobacterium species of the 

4 tuberculosis complex. 

1 1 10. The protein of claim 109, comprising a full-length HTCC#1 antigen 

2 from a Mycobacterium species of the tuberculosis complex, and a full-length TbH9 antigen 

3 from a Mycobacterium species of the tuberculosis complex. 

1 111. The protein of claim 1 1 0, wherein the fusion polypeptide has the 

2 amino acid sequence of HTCC#l(FL)-TbH9(FL). 

1 112. The protein of claim 1 09, comprising a polypeptide comprising amino 

2 acids 1 84-392 of an HTCC#1 antigen from a Mycobacterium species of the tuberculosis 

3 complex, a TbH9 antigen or an immunogenic fragment thereof from a Mycobacterium 
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4 species of the tuberculosis complex, and a polypeptide comprising amino acids 1-129 of an 

5 HTCC#1 antigen from a Mycobacterium species of the tuberculosis complex. 

1 113. The protein of claim 112, wherein the fusion polypeptide has the 

2 amino acid sequence of HTCC#l(184-392)/TbH9/HTCC#l(l-129). 

1 1 14. A fusion protein comprising a TbRal2 antigen or an immunogenic 

2 fragment thereof from a Mycobacterium species of the tuberculosis complex, and an 

3 HTCC#1 antigen or an immunogenic fragment thereof from a Mycobacterium species of the 

4 tuberculosis complex. 

1 115. The protein of claim 114, wherein the fusion polypeptide has the 

2 amino acid sequence of TbRal2-HTCC#l. 
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PATENT 

Attorney Docket No.: 014058-009041US 
Client Ref. Nos. CX-99-0045 and CX-99-0061 

FUSION PROTEINS OF MYCOBACTERIUM TUBERCULOSIS 

ABSTRACT OF THE DISCLOSURE 
The present invention relates to fusion proteins containing at least two 
Mycobacterium species antigens. In particular, it relates to nucleic acids encoding fusion 
5 proteins that include two or more individual M. tuberculosis antigens, which increase 

serological sensitivity of sera from individuals infected with tuberculosis, and methods for 
their use in the diagnosis, treatment, and prevention of tuberculosis infection. 

SF 1145088 vl 

10 



95 



Figure 1: Nucleotide Sequence of TbF14 
Sheet 1 of 4 



FEATURES 



misc feature 



misc feature 



misc feature 



Location/Qualifiers 
5072. .5095 

/note="His tag coding region 
5096 . .7315 

/note="MtB81 coding region" 
7316 . . 8594 

/note="Mo2 coding region" 



TGGCGAATGGGACGCGCCCTGTAGCGGCGCATTAAGCGCGGCGGGTGTGGTGGTTACGCGCAGCGT 
GACCGCTACACTTGCCAGCGCCCTAGCGCCCGCTCCTTTCGCTTTCTTCCCTTCCTTTCTCGCCAC 
GTTCGCCGGCTTTCCCCGTCAAGCTCTAAATCGGGGGCTCCCTTTAGGGTTCCGATTTAGTGCTTT 
ACGGCACCTCGACGCCAAAAAACTTGATTAGGGTGATGGTTCACGTAGTGGGCCATCGCCCTGATA 
GACGGTTTTTCGCCCTTTGACGTTGGAGTCCACGTTCTTTAATAGTGGACTCTTGTTCCAAACTGG 
AACAACACTCAACCCTATCTCGGTCTATTCTTTTGATTTATAAGGGATTTTGCCGATTTCGGCCTA 
TTGGTTAAAAAATGAGCTGATTTAACAAAAATTTAACGGGAATTTTAACAAAATATTAACGTTTAC 
AATTTCAGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTGTTTATTTTTGTAAATACA 
TTCAAATATGTATCCGCTCATGAATTAATTCTTAGAAAAACTCATCGAGCATCAAATGAAACTGCA 
ATTTATTCATATCAGGATTATCAATACCATATTTTTGAAAAAGCCGTTTCTGTAATGAAGGAGAAA 
ACTCACCGAGGCAGTTCCATAGGATGGCAAGATCCTGGTATCGGTCTGCGATTCCGACTCGTCCAA 
CATCAATACAACCTATTAATTTCCCCTCGTCAAAAATAAGGTTATCAAGTGAGAAATCACCATGAG 
TGACGACTGAATCCGGTGAGAATGGCAAAAGTTTATGCATTTCTTTCCAGACTTGTTCAACAGGCC 
AGCCATTACGCTCGTGATCAAAATCACTCGCATCAACCAAACCGTTATTCATTCGTGATTGCGCCT 
GAGCGAGAGGAAATACGCGATCGCTGTTAAAAGGACAATTACAAACAGGAATCGAATGCAACCGGC 
GCAGGAACACTGCCAGGGCATGAACAATATTTTCACCTGAATCAGGATATTCTTCTAATACCTGGA 
ATGCTGTTTTCCCGGGGATCGCAGTGGTGAGTAACCATGCATCATGAGGAGTACGGATAAAATGCT 
TGATGGTCGGAAGAGGGATAAATTCCGTCAGCCAGTTTAGTCTGACCATCTCATCTGTAACATCAT 
TGGCAACGCTACCTTTGCCATGTTTCAGAAACAACTCTGGCGCATCGGGCTTCCCATACAATCGAT 
AGATTGTCGCACCTGATTGCCCGACATTATCGCGAGCGCATTTATACCCATATAAATCAGCATCCA 
TGTTGGAATTTAATCGCGGCCTAGAGCAAGACGTTTCCCGTTGAATATGGCTCATAACACCCCTTG 
TATTACTGTTTATGTAAGCAGACAGTTTTATTGTTCATGACCAAAATCCCTTAACGTGAGTTTTCG 
TTCCACTGAGCGTCAGACCCCGTAGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGG 
GTAATCTGCTGCTTGCAAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAG 
CTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTCCTTCTA 
GTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTGGGTCTGCTA 
ATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCAAGACGA 
TAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTTGGAG 
CGAACGACCTACACCGAACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAA 
GGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGGTT 
CCAGGGGGAAACGCCTGGTATCTTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGA 
TTTTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGG 
TTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGAT 
AACCGTATTACCGCCTTTGAGTGAGCTGATACCGCTCGCCGCAGCCGAACGACCGAGGGCAGCGAG 
TCAGTGAGCGAGGAAGCGGAAGAGCGCGTGATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATT 
TCACACCGCATATATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGTATA 
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CACTCCGCTATCGCTACGTGACTGGGTCATGGCTGCGCCCCGACACCCGCCAACACCCGCTGACGC 

GCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACAGACAAGCTGTGACCGTCTCCGGGAGCTG 

CATGTGTCAGAGGTTTTCACCGTCATCACCGAAACGCGCGAGGCAGCTGCGGTAAAGCTCATCAGC 

GTGGTCGTGAAGCGATTCACAGATGTCTGCCTGTTCATCCGCGTCCAGCTCGTTGAGTTTCTCCAG 

AAGCGTTAATGTCTGGCTTCTGATAAAGCGGGCCATGTTAAGGGCGGTTTTTTCCTGTTTGGTCAC 

TGATGCCTCCGTGTAAGGGGGATTTCTGTTCATGGGGGTAATGATACCGATGAAACGAGAGAGGAT 

GCTCACGATACGGGTTACTGATGATGAACATGCCCGGTTACTGGAACGTTGTGAGGGTAAACAACT 

GGCGGTATGGATGGGGCGGGACCAGAGAAAAATCACTCAGGGTCAATGCCAGCGCTTCGTTAATAC 

AGATGTAGGTGTTCCACAGGGTAGCCAGCAGCATCCTGCGATGCAGATCCGGAACATAATGGTGCA 

GGGCGCTGACTTCCGCGTTTCCAGACTTTACGAAACACGGAAACCGAAGACCATTCATGTTGTTGC 

TCAGGTCGCAGACGTTTTGCAGCAGCAGTCGCTTCACGTTGGCTCGCGTATCGGTGATTCATTCTG 

CTAACCAGTAAGGCAACCGCGCCAGCCTAGCCGGGTCCTCAACGACAGGAGCACGATCATGCGCAC 

CCGTGGGGCCGCCATGCGGGCGATAATGGCCTGCTTCTCGCCGAAACGTTTGGTGGCGGGACCAGT 

GACGAAGGCTTGAGCGAGGGCGTGGAAGATTCCGAATACCGCAAGCGACAGGCCGATCATCGTCGC 

GCTCCAGCGAAAGCGGTCCTGGCCGAAAATGACCCAGAGCGCTGCCGGCACCTGTCCTACGAGTTG 

CATGATAAAGAAGACAGTCATAAGTGCGGCGACGATAGTCATGCCCCGCGCCCACCGGAAGGAGCT 

GACTGGGTTGAAGGCTCTCAAGGGCATCGGTCGAGATCCCGGTGCCTAATGAGTGAGCTAACTTAC 

ATTAATTGCGTTGCGCTCACTGGCCGCTTTCCAGTCGGGAAACCTGTCGTGCGAGCTGGATTAATG 

AATCGGCCAACGCGCGGGGAGAGGCGGTTTGCGTATTGGGCGCCAGGGTGGTTTTTCTTTTCACCA* 

GTGAGACGGGCAACAGCTGATTGCCCTTCACCGCCTGGCCCTGAGAGAGTTGCAGCAAGCGGTCCA 

CGCTGGTTTGCCCCAGCAGGCGAAAATCCTGTTTGATGGTGGTTAACGGCGGGATATAACATGAGC 

TGTCTTCGGTATCGTCGTATCCCACTACCGAGATATCCGCACCAACGCGCAGCCCGGACTCGGTAA 

TGGCGGGCATTGCGCCCAGCGCCATCTGATCGTTGGCAACCAGCATCGCAGTGGGAACGATGCCCT 

CATTCAGCATTTGGATGGTTTGTTGAAAACCGGACATGGCACTCCAGTCGCCTTCCCGTTCCGCTA 

TCGGCTGAATTTGATTGCGAGTGAGATATTTATGCCAGCCAGCCAGACGCAGACGCGCCGAGACAG 

AACTTAATGGGCCCGCTAACAGCGCGATTTGCTGGTGACCCAATGCGACCAGATGCTCCACGCCCA 

GTCGCGTACCGTCTTCATGGGAGAAAATAATACTGTTGATGGGTGTCTGGTCAGAGACATCAAGAA 

ATAACGCCGGAACATTAGTGCAGGCAGCTTCCACAGCAATGGCATCCTGGTCATCCAGCGGATAGT 

TAATGATCAGCCCACTGACGCGTTGCGCGAGAAGATTGTGCACCGCCGCTTTACAGGCTTCGACGC 

CGCTTCGTTCTACCATCGACACCACCACGCTGGCACCCAGTTGATCGGCGCGAGATTTAATCGCCG 

CGACAATTTGCGACGGCGCGTGCAGGGCCAGACTGGAGGTGGCAACGCCAATCAGCAACGACTGTT 

TGCCCGCCAGTTGTTGTGCCACGCGGTTGGGAATGTAATTCAGCTCCGCCATCGCCGCTTCCACTT 

TTTCCCGCGTTTTCGCAGAAACGTGGCTGGCCTGGTTCACCACGCGGGAAACGGTCTGATAAGAGA 

CACCGGCATACTCTGCGACATCGTATAACGTTACTGGTTTCACATTCACCACCCTGAATTGACTCT 

CTTCCGGGCGCTATCATGCCATACCGCGAAAGGTTTTGCGCCATTCGATGGTGTCCGGGATCTCGA 

CGCTCTCCCTTATGCGACTCCTGCATTAGGAAGCAGCCCAGTAGTAGGTTGAGGCCGTTGAGCACC 

GCCGCCGCAAGGAATGGTGCATGCAAGGAGATGGCGCCCAACAGTCCCCCGGCCACGGGGCCTGCC 

ACCATACCCACGCCGAAACAAGCGCTCATGAGCCCGAAGTGGCGAGCCCGATCTTCCCCATCGGTG 

ATGTCGGCGATATAGGCGCCAGCAAGCGCACCTGTGGCGCCGGTGATGCCGGCCACGATGCGTCCG 

GCGTAGAGGATCGAGATCTCGATCCCGCGAAATTAATACGACTCACTATAGGGGAATTGTGAGCGG 

ATAACAATTCCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACATATGCAGCATCA 

CCACCATCACCACACTGATCGCGTGTCGGTGGGCAACTTGCGCATCGCTCGGGTGGTCTACGACTT 

CGTGAACAATGAAGCCGTGCCTGGCACCGATATCGACCCGGACAGCTTCTGGGCGGGCGTCGACAA 
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GTCGTCGCCGACCTGACCCCGCAGAACCAAGCTCTGTTGAACGCCCGCGACGAGCTGCAGGCGCAG 

ATCGACAAGTGGCACCGGCGTCGGGTGATCGAGCCCATCGACATGGATGCCTACCGCCAGTTCCTC 

ACCGAGATCGGCTAGCTGCTTCCCGAACCTGATGACTTCACCATCACCACGTCCGGTGTCGACGCT 

GAGATCACCACGACCGCCGGCCCCCAGCTGGTGGTGCCGGTGCTCAACGCGCGGTTTGCTCTGAAC 

GCGGCCAACGCTCGCTGGGGCTCCCTCTACGACGCCTTGTATGGCACCGATGTCATCCCCGAGACC 

GACGGCGCCGAAAAAGGCCCCACGTACAACAAGGTTCGTGGCGACAAGGTGATCGCGTATGCCCGG 

AAGTTCCTCGACGACAGTGTTCCGCTGTCGTCGGGTTCCTTTGGCGACGCCACCGGTTTCACAGTG 

CAGGATGGCCAGCTCGTGGTTGCCTTGCCGGATAAGTCCACCGGCCTGGCCAACCCCGGCCAGTTC 

GCCGGCTACACCGGCGCAGCCGAGTCGCCGACATCGGTGCTGCTAATCAATCACGGTTTGCACATC 

GAGATCCTGATCGATCCGGAGTCGCAGGTCGGCACCACCGACCGGGCCGGCGTCAAGGACGTGATC 

CTGGAATCCGCGATCACCACGATCATGGACTTCGAGGACTCGGTGGCCGCCGTGGACGCCGCCGAC 

AAGGTGCTGGGTTATCGGAACTGGCTCGGCCTGAACAAGGGCGACCTGGCAGCAGCGGTAGACAAG 

GACGGCACCGCTTTCCTGCGGGTGCTCAATAGGGACCGGAACTACACCGCACCCGGCGGTGGCCAG 

TTCACGCTGCCTGGACGCAGCCTCATGTTCGTCCGCAACGTCGGTCACTTGATGACGAATGACGCC 

ATCGTCGACACTGACGGCAGCGAGGTGTTCGAAGGCATCATGGATGCCCTATTCACCGGCCTGATC 

GCCATCCACGGGCTAAAGGCCAGCGACGTCAACGGGCCGCTGATCAACAGCCGCACCGGCTCCATC 

TACATCGTCAAGCCGAAGATGCACGGTCCGGCCGAGGTGGCGTTTACCTGCGAACTGTTCAGCCGG 

GTTGAAGATGTGCTGGGGTTGCCGCAAAACACCATGAAGATCGGCATCATGGACGAGGAACGCCGG 

ACCACGGTCAACCTCAAGGCGTGCATCAAAGCTGCCGCGGACCGCGTGGTGTTCATCAACACCGGG* 

TTCCTGGACCGCACCGGCGATGAAATCCACACCTCGATGGAGGGCGGCCCGATGGTGCGCAAGGGC 

ACCATGAAGAGCCAGCCGTGGATCTTGGCCTACGAGGACCACAACGTCGATGCCGGCCTGGCCGCC 

GGGTTCAGCGGCCGAGCCCAGGTCGGCAAGGGCATGTGGACAATGACCGAGCTGATGGCCGACATG 

GTCGAGACAAAAATCGCCCAGCCGCGCGCCGGGGCCAGCACCGCCTGGGTTCCCTCTCCCACTGCG 

GCCACCCTGCATGCGCTGCACTACCACCAGGTCGACGTCGCCGCGGTGCAACAAGGACTGGCGGGG 

AAGCGTCGCGCCACCATCGAACAATTGCTGACCATTCCGCTGGCCAAGGAATTGGCCTGGGCTCCC 

GACGAGATCCGCGAAGAGGTCGACAACAACTGTCAATCCATCCTCGGCTACGTGGTTCGCTGGGTT 

GATCAAGGTGTCGGCTGCTCGAAGGTGCCCGACATCCACGACGTCGCGCTCATGGAGGACCGGGCC 

ACGCTGCGAATCTCCAGCCAATTGTTGGCCAACTGGCTGCGCCACGGTGTGATCACCAGCGCGGAT 

GTGCGGGCCAGCTTGGAGCGGATGGCGCCGTTGGTCGATCGACAAAACGCGGGCGACGTGGCATAC 

CGACCGATGGCACCCAACTTCGACGACAGTATCGCCTTCCTGGCCGCGCAGGAGCTGATCTTGTCC 

GGGGCCCAGCAGCCCAACGGCTACACCGAGCCGATCCTGCACCGACGTCGTCGGGAGTTTAAGGCC 

CGGGCCGCTGAGAAGCCGGCCCCATCGGACAGGGCCGGTGACGATGCGGCCAGGGTGCAGAAGTAC 

GGCGGATCCTCGGTGGCCGACGCCGAACGGATTCGCCGCGTCGCCGAACGCATCGTCGCCACCAAG 

AAGCAAGGCAATGACGTCGTCGTCGTCGTCTCTGCCATGGGGGATACCACCGACGACCTGCTGGAT 

CTGGCTCAGCAGGTGTGCCCGGCGCCGCCGCCTCGGGAGGTGGACATGCTGCTTACCGCCGGTGAA 

CGCATCTCGAATGCGTTGGTGGCCATGGCCATCGAGTCGCTCGGCGCGCATGCCCGGTCGTTCACC 

GGTTCGCAGGCCGGGGTGATCACCACCGGCACCGACGGCAACGCCAAGATCATCGACGTCACGCCG 

GGGCGGCTGCAAACCGCCCTTGAGGAGGGGCGGGTCGTTTTGGTGGCCGGATTCCAAGGGGTCAGC 

CAGGACACCAAGGATGTCACGACGTTGGGCCGCGGCGGCTCGGACACCACCGCCGTCGCCATGGCC 

GCCGCGCTGGGTGCCGATGTCTGTGAGATCTACACCGACGTGGACGGCATCTTCAGCGCCGACCCG 

CGCATGGTGCGCAACGCCCGAAAGCTCGACACCGTGACCTTCGAGGAAATGCTCGAGATGGCGGCC 

TGCGGCGCCAAGGTGGTGATGCTGCGCTGCGTGGAATACGCTCGCCGCCATAATATTCCGGTGCAC 

GTCCGGTCGTCGTACTCGGACAGACCGGGCACCGTCGTTGTCGGATGGATGAAGGACGTACCCATG 
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GAAGACCCCATCCTGACCGGAGTCGCGCACGACCGCAGCGAGGCCAAGGTGACCATCGTCGGGCTG 
CCCGACATCCCCGGGTATGCGGCCAAGGTGTTTAGGGCGGTGGCCAGACGCCGACGTCAACATCGA 
CATGGTGCTGCAGAACGTCTCGAAGGTCGAGGACGGCAAGACCGACATCACCTTCACCTGCTCCCG 
CAGACGTCGGGCCCGCCGCCGTGGAAAAACTGGACTCGCTCAGAAACGAGATCGGCTTCTACACAG 
CTGCTGTACGACGACCACATCGGCAAGGTATCGCTGATCGGTGCCGGCATGCGCAGCCACCCCGGG 
GTCACCGCGACGTTCTGTGAGGCGCTGGCGGCGGTGGGGGTCAACATCGAGCTGATCTCCACCTCG 
GAAGATCAGAGATCTCGGTGTTGTGCCGCGACACCGAACTGGACAAGGCCGTGGTCGCGCTGCATG 
AAGCGTTCGGGCTCGGCGGCGACGAGGAGGCCACGGTGTACGCGGGGACGGGACGGTAGATGGGCC 
TGTCAATAGTGAATTCATCGATGTGCAGATATCCATCACACTGGCGGCCGCTCGAGCACCACCACC 
ACCACCACTGAGATCCGGCTGCTAACAAAGCCCGAAAGGAAGCTGAGTTGGCTGCTGCCACCGCTG 
AGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGCTGAAAGGAG 
GAACTATATCCGGAT 
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TGGCGAATGGGACGCGCCCTGTAGCGGCGCATTAAGCGCGGCGGGTGTGGTGGTTACGCGCAGCGT 
GACCGCTACACTTGCCAGCGCCCTAGCGCCCGCTCCTTTCGCTTTCTTCCCTTCCTTTCTCGCCAC 
GTTCGCCGGCTTTCCCCGTCAAGCTCTAAATCGGGGGCTCCCTTTAGGGTTCCGATTTAGTGCTTT 
ACGGCACCTCGAGCCCAAAAAACTTGATTAGGGTGATGGTTCACGTAGTGGGCCATCGCCCTGATA 
GACGGTTTTTCGCCCTTTGACGTTGGAGTCCACGTTCTTTAATAGTGGACTCTTGTTGCAAACTGG 
AACAACACTCAACCCTATCTCGGTCTATTCTTTTGATTTATAAGGGATTTTGCCGATTTCGGCCTA 
TTGGTTAAAAAATGAGCTGATTTAACAAAAATTTAACGCGAATTTTAACAAAATATTAACGTTTAC 
AATTTCAGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTGTTTATTTTTCTAAATACA 
TTCAAATATGTATCCGCTCATGAATTAATTCTTAGAAAAACTCATCGAGCATCAAATGAAACTGCA 
ATTTATTCATAT C AGGAT T ATCAATAC CAT AT T TTTGAAAAAGCCGTTTCTGTAATGAAGGAGAAA 
ACTCACCGAGGCAGTTCCATAGGATGGCAAGATCCTGGTATCGGTCTGCGATTCCGACTCGTCCAA 
CATCAATACAACCTATTAATTTCCCCTCGTCAAAAATAAGGTTATCAAGTGAGAAATCACCATGAG 
TGACGACTGAATCCGGTGAGAATGGCAAAAGTTTATGCATTTCTTTCCAGACTTGTTCAACAGGCC 
AGCCATTACGCTCGTCATCAAAATCACTCGCATCAACCAAACCGTTATTCATTCGTGATTGCGCCT 
GAGCGAGACGAAATACGCGATCGCTGTTAAAAGGACAATTACAAACAGGAATCGAATGCAACCGGC 
GCAGGAACACTGCCAGCGCATCAACAATATTTTCACCTGAATCAGGATATTCTTCTAATACCTGGA 
ATGGTGTTTTCCCGGGGATCGCAGTGGTGAGTAACCATGCATCATCAGGAGTACGGATAAAATGGT 
TGATGGTCGGAAGAGGCATAAATTCCGTCAGCCAGTTTAGTCTGACCATCTCATCTGTAACATCAT 
TGGCAACGCTACCTTTGCGA.TGTTTCAGAAACAACTCTGGCGCATCGGGCTTCCCATACAATCGAT 
AGATTGTCGCACCTGATTGCCCGACATTATCGCGAGCCCATTTATACCCATATAAATCAGCATCCA 
TGTTGGAATTTAATCGCGGCCTAGAGCAAGACGTTTCCCGTTGAATATGGCTCATAACACCCCTTG 
TATTACTGTTTATGTAAGCAGACAGTTTTATTGTTCATGACCAAAATCCCTTAACGTGAGTTTTCG 
TTCCACTGAGCGTCAGACCCCGTAGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGG 
GTAATCTGCTGCTTGGAAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAG 
CTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTCCTTCTA 
GTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGCTA 
ATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCAAGACGA 
TAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTTGGAG 
CGAACGACCTACACCGAACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAA 
GGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTT 
CCAGGGGGAAACGCCTGGTATCTTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGA 
TTTTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGG 
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TTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGAT 

AACCGTATTACCGCCTTTGAGTGAGCTGATACCGCTCGCCGCAGCCGAACGACCGAGCGCAGCGAG 

TCAGTGAGCGAGGAAGCGGAAGAGCGCCTGATGCGGTATTTTCTCCTTACGCATCTGTGCGGTATT 

TCACACCGCATATATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGTATA 

CAGTCCGCTATCGCTACGTGACTGGGTCATGGCTGCGCCCCGACACCCGCCAACACCCGCTGACGC 

GCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACAGACAAGCTGTGACCGTCTCCGGGAGCTG 

CATGTGTCAGAGGTTTTCACCGTCATCACCGAAACGCGCGAGGCAGCTGCGGTAAAGCTCATCAGC 

GTGGTCGTGAAGCGATTGACAGATGTCTGCCTGTTCATCCGCGTCCAGCTCGTTGAGTTTCTCCAG 

AAGCGTTAATGTCTGGCTTCTGATAAAGCGGGCCATGTTAAGGGCGGTTTTTTCCTGTTTGGTCAC 

TGATGCCTCCGTGTAAGGGGGATTTCTGTTCATGGGGGTAATGATACCGATGAAACGAGAGAGGAT 

GCTCACGATACGGGTTACTGATGATGAACATGCCCGGTTACTGGAACGTTGTGAGGGTAAACAACT 

GGCGGTATGGATGCGGCGGGACCAGAGAAAAATCACTCAGGGTCAATGCCAGCGCTTCGTTAATAC 

AGATGTAGGTGTTCCACAGGGTAGCCAGCAGCATCCTGCGATGCAGATCCGGAACATAATGGTGCA 

GGGCGCTGACTTCCGCGTTTCCAGACTTTACGAAACACGGAAACCGAAGACCATTCATGTTGTTGC 

TCAGGTCGCAGACGTTTTGCAGCAGCAGTCGCTTCACGTTCGCTCGCGTATCGGTGATTCATTCTG 

CTAACCAGTAAGGCAACCCCGCCAGCCTAGCCGGGTCCTCAACGACAGGAGCACGATCATGCGCAC 

CCGTGGGGCCGCCATGCCGGCGATAATGGCCTGCTTCTGGCCGAAACGTTTGGTGGCGGGACCAGT 

GACGAAGGCTTGAGCGAGGGCGTGCAAGATTCCGAATACCGCAAGCGACAGGCCGATCATCGTCGC 

GCTCCAGCGAAAGCGGTCCTCGCCGAAAATGACCCAGAGCGCTGCCGGCACCTGTCCTACGAGTTG" 

CATGATAAAGAAGACAGTCATAAGTGCGGCGACGATAGTCATGCCCCGCGCCCACCGGAAGGAGCT 

GACTGGGTTGAAGGCTCTCAAGGGCATCGGTCGAGATCCCGGTGCCTAATGAGTGAGCTAACTTAC 

ATTAATTGCGTTGCGCTCACTGCCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTGCATTAATG 

AATCGGCCAACGCGCGGGGAGAGGCGGTTTGCGTATTGGGCGCCAGGGTGGTTTTTCTTTTCACCA 

GTGAGACGGGCAACAGCTGATTGCCCTTCACCGCCTGGCCCTGAGAGAGTTGCAGCAAGCGGTCCA 

CGCTGGTTTGCCCCAGCAGGCGAAAATCCTGTTTGATGGTGGTTAACGGCGGGATATAACATGAGC 

TGTCTTCGGTATCGTCGTATCCCACTACCGAGATATCCGCACCAACGCGCAGCCCGGACTCGGTAA 

TGGCGCGCATTGCGCCCAGCGCCATCTGATCGTTGGCAACCAGCATCGCAGTGGGAACGATGCCCT 

CATTCAGCATTTGCATGGTTTGTTGAAAACCGGACATGGCACTCCAGTCGCCTTCCCGTTCCGCTA 

TCGGCTGAATTTGATTGCGAGTGAGATATTTATGCCAGCCAGCCAGACGCAGACGCGCCGAGACAG 

AACTTAATGGGCCCGCTAACAGCGCGATTTGCTGGTGACCCAATGCGACCAGATGCTCCACGCCCA 

GTCGCGTACCGTCTTCATGGGAGAAAATAATACTGTTGATGGGTGTCTGGTCAGAGACATCAAGAA 

ATAACGCCGGAACATTAGTGCAGGCAGCTTCCACAGCAATGGCATCCTGGTCATCCAGCGGATAGT 

TAATGATCAGCCCACTGACGCGTTGCGCGAGAAGATTGTGCACCGCCGCTTTACAGGCTTCGACGC 

CGCTTCGTTCTACCATCGACACCACCACGCTGGCACCCAGTTGATCGGCGCGAGATTTAATCGCCG 

CGACAATTTGCGACGGCGCGTGCAGGGCCAGACTGGAGGTGGCAACGCCAATCAGCAACGACTGTT 

TGCCCGCCAGTTGTTGTGCCACGCGGTTGGGAATGTAATTCAGCTCCGCCATCGCCGCTTCCACTT 

TTTCCCGCGTTTTCGCAGAAACGTGGCTGGCCTGGTTCACCACGCGGGAAACGGTCTGATAAGAGA 

CACCGGCATACTCTGCGACATCGTATAACGTTACTGGTTTCACATTCACCACCCTGAATTGACTCT 

CTTCCGGGCGCTATCATGCCATACCGCGAAAGGTTTTGCGCCATTCGATGGTGTCCGGGATCTCGA 

CGCTCTCCCTTATGCGACTCCTGCATTAGGAAGCAGCCCAGTAGTAGGTTGAGGCCGTTGAGCACC 

GCCGCCGCAAGGAATGGTGCATGCAAGGAGATGGCGCCCAAGAGTCCCCCGGCCACGGGGCCTGCC 

ACCATACCGACGCCGAAACAAGCGCTCATGAGCCCGAAGTGGCGAGCCCGATCTTCCCCATCGGTG 

ATGTCGGCGATATAGGCGCCAGCAACCGCACCTGTGGCGCCGGTGATGCCGGCCACGATGCGTCCG 
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GCGTAGAGGATCGAGATCTCGATCCCGCGAAATTAATACGACTCACTATAGGGGAATTGTGAGCGG 
ATAACAATTCCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACATATGGGCCATCA 
TCATCATCATCACGTGATCGACATCATCGGGACCAGCCCCACATCCTGGGAACAGGCGGCGGCGGA 
GGCGGTCCAGCGGGCGCGGGATAGCGTCGATGACATCCGCGTCGCTCGGGTCATTGAGCAGGACAT 
GGCCGTGGACAGCGCCGGCAAGATCACCTACCGCATCAAGCTCGAAGTGTCGTTCAAGATGAGGCC 
GGCGCAACCGAGGTGTGGCTCGAAACCACCGAGCGGTTCGCCTGAAACGGGCGCCGGCGCCGGTAC 
TGTCGCGACTACCCCCGCGTCGTCGCCGGTGACGTTGGCGGAGACCGGTAGCACGCTGCTCTACCC 
GCTGTTCAACCTGTGGGGTCCGGCCTTTCACGAGAGGTATCCGAACGTCACGATCACCGCTCAGGG 
CACCGGTTCTGGTGCCGGGATCGCGCAGGCCGCCGCCGGGACGGTCAACATTGGGGCCTCCGACGC 
CTATCTGTCGGAAGGTGATATGGCCGCGCACAAGGGGCTGATGAACATCGCGCTAGCCATCTCCGC 
TCAGCAGGTCAACTACAACCTGCCCGGAGTGAGCGAGCACCTCAAGCTGAACGGAAAAGTCCTGGC 
GGCCATGTACCAGGGCACCATCAAAACCTGGGACGACCCGCAGATCGCTGCGCTCAACCCCGGCGT 
GAACCTGCCCGGCACCGCGGTAGTTCCGCTGCACCGCTCCGACGGGTCCGGTGACACCTTCTTGTT 
CACCCAGTACCTGTCCAAGCAAGATCCCGAGGGCTGGGGCAAGTCGCCCGGCTTCGGCACCACCGT 
CGACTTCCCGGCGGTGCCGGGTGCGCTGGGTGAGAACGGCAACGGCGGCATGGTGACCGGTTGCGC 
CGAGACACCGGGCTGCGTGGCCTATATCGGCATCAGCTTCCTCGACCAGGCCAGTCAACGGGGACT 
CGGCGAGGCCCAACTAGGCAATAGCTCTGGCAATTTCTTGTTGCCCGACGCGCAAAGCATTCAGGC 
CGCGGCGGCTGGCTTCGCATCGAAAACCCGGGCGAAGCAGGCGATTTCGATGATCGACGGGCCCGC 
CCCGGACGGCTACCCGATCATCAACTACGAGTACGCCATCGTCAACAACCGGCAAAAGGACGCGGC 
CACCGCGCAGACCTTGCAGGCATTTCTGCACTGGGCGATCACCGACGGCAACAAGGCCTCGTTCCT 
CGACCAGGTTCATTTCCAGCCGCTGCCGCCCGCGGTGGTGAAGTTGTCTGACGCGTTGATCGCGAC 
GATTTCCAGCGGTGAGATGAAGACCGATGCCGCTACCCTCGCGCAGGAGGCAGGTAATTTCGAGCG 
GATCTCCGGCGACCTGAAAACCCAGATCGACCAGGTGGAGTCGACGGCAGGTTCGTTGCAGGGCCA 
GTGGCGCGGCGCGGCGGGGACGGCCGGCCAGGCCGCGGTGGTGCGCTTCCAAGAAGCAGCCAATAA 
GCAGAAGCAGGAACTCGACGAGATCTCGACGAATATTCGTCAGGCCGGCGTCCAATACTCGAGGGC 
CGACGAGGAGCAGCAGGAGGCGCTGTCCTGGCAAATGGGCTTTACTCAGTCGCAGACCGTGACGGT 
GGATCAGCAAGAGATTTTGAACAGGGCCAACGAGGTGGAGGCCCCGATGGCGGACCCACCGACTGA 
TGTCCCCATCACACCGTGCGAACTCACGGCGGCTAAAAACGGCGCCCAACAGCTGGTATTGTCCGC 
CGAG?ACATGCGGGAATACCTGGCGGCCGGTGCCAAAGAGCGGCAGCGTCTGGCGACCTCGCTGCG 
CAACGCGGCCAAGGCGTATGGCGAGGTTGATGAGGAGGCTGCGACCGCGCTGGACAACGACGGCGA 
AGGAACTGTGCAGGCAGAATCGGCCGGGGCCGTCGGAGGGGACAGTTCGGCCGAACTAACCGATAC 
GCCGAGGGTGGCCACGGCCGGTGAACCCAACTTCATGGATCTCAAAGAAGCGGCAAGGAAGCTCGA 
AACGGGCGACCAAGGCGCATCGCTCGCGCACTTTGCGGATGGGTGGAACACTTTCAACCTGACGCT 
GCAAGGCGACGTCAAGCGGTTCCGGGGGTTTGACAACTGGGAAGGCGATGCGGCTACCGCTTGCGA 
GGCTTCGCTCGATCAACAACGGCAATGGATACTCCACATGGCCAAATTGAGCGCTGCGATGGCCAA 
GCAGGCTCAATATGTCGCGCAGCTGCACGTGTGGGCTAGGCGGGAACATCCGACTTATGAAGACAT 
AGTCGGGCTCGAACGGCTTTACGCGGAAAACCCTTCGGCCCGCGACCAAATTCTCCCGGTGTACGC 
GGAGTATCAGCAGAGGTCGGAGAAGGTGCTGACCGAATACAACAACAAGGCAGCCCTGGAACCGGT 
AAACCCGCCGAAGCCTCCCCCCGCCATCAAGATCGACCCGCCCCCGCCTCCGCAAGAGCAGGGATT 
GATCCCTGGCTTCCTGATGCCGCCGTCTGACGGCTCCGGTGTGACTCCCGGTACCGGGATGCCAGC 
CGCACCGATGGTTCCGCCTACCGGATCGCCGGGTGGTGGCCTCCCGGGTGACACGGCGGCGCAGCT 
GACGTCGGCTGGGCGGGAAGCCGCAGCGCTGTCGGGCGACGTGGCGGTCAAAGCGGCATCGCTCGG 
TGGCGGTGGAGGCGGCGGGGTGCCGTCGGCGCCGTTGGGATCCGCGATCGGGGGCGCCGAATCGGT 
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GCGGCCCGCTGGCGCTGGTGACATTGCCGGCTTAGGCCAGGGAAGGGCCGGCGGCGGCGCCGCGCT 
GGGCGGCGGTGGCATGGGAATGCCGATGGGTGCCGCGCATCAGGGACAAGGGGGCGCCAAGTCCAA 
GGGTTCTCAGCAGGAAGACGAGGCGCTCTACACCGAGGATCGGGCATGGACCGAGGCCGTCATTGG 
TAACCGTCGGCGCCAGGACAGTAAGGAGTCGAAGTGAATTCTGCAGATATCCATCACACTGGCGGC 
CGCTCGAGCACCACCACCACCACCACTGAGATGCGGCTGCTAACAAAGCCCGAAAGGAAGCTGAGT 
TGGCTGCTGCCACCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGG 
GTTTTTTGCT GAAAGGAGGAACTAT AT C CGGAT 
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MQHHHHHHTDRVS VGNLR I ARVLYD F VNNEAL PGTD I D PD S FWAGVD KWADLT P QNQALLNARD E 
LQAQ I DKWHRRRVI EP IDMDAYRQFLTE I GYLLPEPDDFTI TTS GVDAE I TTTAGPQLWP VLNAR 
FALNAANARWGS L YDAL YGTDVI PETDGAEKGPTYNKVRGDKVI AYARKFLDDSVPLS SGS FGDAT 
GFWQDGQLWALPDKSTGLANPGQFAGYTGAAESPTSVLLINHGLHIEILIDPESQVGTTDRAGV 
KDVILESAITTIMDFEDSVAAVDAADKV^ 

GGGQFTLPGRSLMFVRNVGHLMTNDAIVDTDGSEVFEGIMDALFTGLIAIHGLKASDVNGPLINSR 
TGSIYIVKPKMHGPAEVAFTCELFSRVEDVLGLPQNTMKIGIMDEERRTTWLKACIKAAADRWF 
INTGFLDRTGDEIHTSMEAGPMVRKGTMK^ 

MADMVETKIAQPRAGASTAWPSPTAATLHALHYHQVDVAAVQQGLAGKRRATIE 

AWAPDE I REE VDNNCQ S I LGYWRWVDQGVGC S KVPD I HDVALMEDRATLR I S SQLLANWLRHGVI 

TSADVRASLERMAPLVDRQNAGDVAYRPMAPOT^ 

EFKARAAEKPAPSDRAGDDAARVQKYGGSSVADAERIRRVAERIVATKKQGNDVVVWSAM 
DLLDLAQQVCPAPPPRELDMLLTAGERISNALVAMAIESLGAHARSFTGSQAGVITTGTHGNAKII 
DVTPGRLQTALEEGRWLVAGFQGVSQDTKDVTTLGRGGSDTTAVAMAAALGADVCE I YTDVDGI F 
SADPRIVRNARKLDTVTFEEMLEMAACGAKVLMLRCVEYARRHNIPVHW 

DVPMEDPILTGVAHDRSEAKVTIVGLPDIPGYAAKV"FRAVARRRRQHRHGAAERLQGRGRQDRHHL 
HLLPQTSGPPPWKNWTRSETRSASTQLLYDDHIGKVS 

I S T S EDQRS RC C AAT PNWTRP WS RCMKRS GS AATRR PRCTRGRDGR WAC Q . . 
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MGHHHHHHVI D 1 1 GTS PTS WEQAAAEAVQRARDS VDD I RVARVI EQDMAVDSAGKI TYRI KLEVS F 

KMRPAQPRCGSKPPSGSPETGAGAGTVATTPASSPVTLAETGSTLLYPLFNLWGPAFHERYPNVTI 

TAQGTGSGAGIAQAAAGTWIGASDAYLSEGDMAAHKGLMNIALAISAQQV^ 

KVLAAMYQGTIKTWDDPQIAALNPGWLPGTAWPLHRSDGSGDTFLFTQYLSKQDPEGWGKSPGF 

GTTVDFPAVPGALGENGNGGMVTGCAETPGCVAYIGISFLDQASQRGLGEAQLGNSSGNFLLPDAQ 

S I QAAAAGF AS KT PANQAI SMIDGPAPDGYPI INYEYAIVNNRQKDAATAQTLQAFLHWAITDGNK 

ASFLDQVHFQPLPPAWKLSDALIATISSAEMKTDAATLAQEAGNFERISGDLKTQIDQVESTAGS 

LQGQWRGAAGTAAQAAVVRFQEAANKQKQELDEISTNIRQAGVQYSRADEEQQQALSSQMGFTQSQ 

T VTVDQQE I LNRANE VE APMAD PPTDVP I TP CELTAAKNAAQQL VL S ADNMREYLAAGAKERQRLA 

TSLRNAAKAYGEVDEEAATALDNDGEGTV^^ 

RKLETGDQGASLAHFADGWNTFNLTLQGDVK^ 

AMAKQAQYVAQLHVWARREHPTYED I VGLERLYAENP SARDQ I LP VYAE YQQRSEKVLTE YNNKAA 
LEPVNPPKPPPAIKIDPPPPPQEQGLIPGFLMPPSDGSGVTPGTGMPAAPMVPPTGSPGGGLPADT 
AAQ LT S AGREAAALS GD VAVKAAS hGGGGGGGVP S AP LGS AI GGAE S VRP AGAGD I AGLGQGRAGG 
GAALGGGGMGMPMGAAHQGQGGAKS KGS QQEDEAL YTEDRAWTEAVI GNRRRQD S KES K . 



5004 
7004 
9004 
11004 
15004 
17004 
18004 
21004 
23004 
26004 
27004 
28004 
30004 
32004 
33004 
34004 
36004 
37004 
39004 
41004 
43004 
44004 
53004 
FD8-24 
FD8-25 
FD8-26 
FD8-27 
FD8-28 
FD8-29 
FD8-30 
FD8-31 
FD8-33 
FD8-34 
FD8-35 
FD8-36 
FD8-37 
FD8-38 
FD8-39 
FD8-40 
FD8-41 
FD8-42 
FD8-43 
FD8-44 
FD8-45 
FD8-46 
FD8-48 
FD8-49 



TbF15 TbF6 




0.244 



0.207 



ill 



MM 

0.317 
0.314 
0.063 
0.142 
0.115 
0.289 
0.238 
0.146 
0.237 
0.071 
0.117 
0.072 
0.089 
0.06 
0.111 
0.241 
0.265 
0.093 
0.273 
0.126 
0.092 
0.057 
0.23 
0.085 
0.247 



Mean 
SD 
Mean +3SD 



0.113 
0.061 
0.298 



0.157 
0.086 
0.414 



Sensitivity 



22/23 20/23 



toncay, July 23, 1SSS 70:42 AM H I CC- 1 -pi Ssfi-^u • 

fTCC#1. seq.rnpd, (1 > 12CC) Sits and Saquencs !/ ' ~ a 

:nzymas : All 515 enzymes (No F£sr) * ~ 1 . 

; a -tines : Circular, Certain Sites Only, Standard Genetic Cede 

CAGGC^TGAGCAGAGCGTTCATCA rCGATCCAACGATCAGTGCC ATTGACGGC 7TG 7ACGACCT7C7GGGGA -7GG A A "AC CC A£CCAAGGGGG7A 7 r C 7 

\ 1 i : ' : • : 1 : t r . . j i. iqq 

G 7CCG7AC7CGTCTCGCAAGTAGTAGCT *GG TTGCTAG'CACGGTAACTGCCGAACATGC "GGAAG A CCCC 7 A ACC 7 7A7GGGTTGGT TCCCCC^TAGG A 



* HTCC-1 FL 3 



tt 5 R A F [ ( 0 ? T i 3 A i Q G L Y Q L L G [ G [ p n Q G G t L 

T7ACTCCTC ACTAGAGTACTTCGAAAAAGCCCTGGAGGAGCTGGCAGCAGCGTTTCCGGGTGA "GGC7GG77IGG"7CGGCCGCGGACAAAT ACGCf'^GC 

-r— ' i ■ ' " : f 1 " : — — i j- ?qq 

AA7GAGGAGTGA TCTCA TG A AGC T TTTTCGGGACC 7CC TCG AC CGTCG7CGCAAAGGCCC AC TACCGACCAATCCAAGCCGGCGCCTGTTTA 7GCGGC CG 



'HTCC-1 FL 1 



YSSL£YF£KAL££UAAAFPGQGWLGSAAOi(YAG 

A A AA ACCGC A ACC A CGTGA A T7TTTTCCAGGAACTGGCAGACCTCGA TCG TC AGC TC A TCAGCC TGA7CCACGACCAGGCCAACGCGG7CCAGACGACCC 

; , : • . i : a i- ; 300 

T7T7TGGCGT7GG7GCACTTAAAAAAGG7CC77GACCGTC7QGAGC7AGCAG7CGAG7AG7CGGAC TAGGTGCTGGTCCGGT7GCGCCAGGTCTGCTGGG 



'HTCC-1 FL 1 



K NRNHV NFFGcL^QLORQL I 3 L (HCGA-NAVCTT 

GCGACA TC C 7 GG AG3GCGCC A AGA A A GG TC TCG AG "TCG TGCG CC CGGTGGCTGTGGACCTG ACC "AC A TCCCGG TCG TCGGGC A CGCCCTATCGGCCGC 
: i . ■ 1 ■ ■ 1 1 , , r i^-jQ 

CGCTGTAGGACCTCCCGCGGTTCTTTCCAGAGCTCAAGCACGCGGGCCACCGACACCTGGACTGGATGTAGGGCCAGCAGCCCGTGCGGGATAGCCGGCG 



'HTCC-1 FL 1 



RO I LEGAKKGLCFVRPVAVOLTY CPVVGHAL'.SAA 

C7TCCAGGCGCCGTTTTGCGCGGGCGCGATGGCCGTAGTGGGCGGCGCGCTTGCCTACTTGGTCGTGAAAACGCTGATCAACGCGAC7CAACTCCTCAAA 
G A AGGTCCGCGGCA A A ACGCGCCCGCGC TACCGGC A TC A CCCGCCGCGCGA ACGG A TG A ACC AGCAC7TTTGCGACTAGTTGCGCTG AG TTGAGGAGTTT 



500 



1 HTCC-1 FL 1 



r Q A P FCAGAilAVVGGALAYL V V K 7 L E MATQLLK 

TTGCTTGCCAAATTGGCGGAGTTGGTCGCGGCCGCCATTGCGGACATCATTTCGGATGTGGCGGACATCATCAAGGGCACCCTCGGAGAAGTGTGGGAGT 
( ! . ■ 1 i •■ i i 1 r ■ — j , , , u 500 

AACGAACGGTTTAACCGCCTCAACCAGCGCCGGCGGTAACGCCTGTAGTAAAGCCTACACCGCCTGTAGTAGTTCCCGTGGGAGCCTCrTCACACCCTCA 



1 HTCC-1 FL* 



LLAXLA E L V A A A I A 0 I I S 0 V A 0 ItXGTLGEVWE 

TC A TC AC A A ACGCGCTCAACGGCCTGAAAGAGCTTTGGGACAAGCTCACGGGGTGGGTGACCGGACTGTTCTC TCG AGGGTGGTCGA ACC TGGAGTCCTT 
i ! ■ ■ 1 i ■ 1 i f i : , 1 , 1 f h 700 

AGTAG7GTT7GCGCGAGTTGCCGGACTTTCTCGAAACCCTGTTCGAGTGCCCCACCCAC7GGCC7GACAAGAGAGC7CCCACCAGC7TGGACC7CAGGAA 



1 HTCC-1 FL 1 



F I TNALNGLKSLWOKLTGWVTGLFSRGWSNLESF 
CTT7GCGGGCGTCCCCGGCTTGACCGGCGCGACCAGCGGCTTGTCGCAAGTGACTGGC7TGTTCGGTGCGGCCGGTCTGTCCGCATCGTCGGGCTTGGCT 

1 1 i j , i ! : ( i i { , : , 1 , 1 , — h 300 

GAAACGCCCGCAGGGGCCGAACTGGCCGCGCTGGTCGCCGAACAGCGTTCACTGACCGAACAAGCCACGCCGGCCAGACAGGCGTAGCAGCCCGAACCGA 

1 HTCC-1 FL 

r A G V PGLTGATSGL5QVTGLFGAAGLSA55GLA 



Icflday. July 25, 1 999 1 Q-A2 AM 

T CC#t seq.mcd, M > 1200) Site and Sacuarc; ^ Pag * a 

CACGCGGA f AGCC rGGCGAGCTCAGCCAGCTTGCCCGCCCTGGCCGGC a "GGGGGCGGG ~£CGGT7rTGGGGGC77 GCCGAGCCTGGCTCAGGTCCATG 

G 7GCGCC 7A7CGGACCGC 7CGAG7CGG7CGAACGGG vGGGACCGGCCG fA AC CCCCGCCCAGGCCAA A ACCCCCGAACGGC "CGGACCGAG7CCAGG7AC ^ 

■ ■■ ■ ■■ ■ ■ ■ ■ " " ' HTCC-t - 1 ■ ■ ■ ■-- 

H A 0 S L A 5 S A S L P A L A G ( GGGSGFGGLPS) A Q V H 



CCGCC7CA AC7CGGC AGGCGC rACGGCCCCGAGC'GA "~^-^'-^^"'^^^^^^^^^~^'-^GAGGAGG"CGGCG GGCAGTCGCAGC~GGTCTCCGCGC AGGG 

1 ~ i 1 fOQG 

GCG7CCC 



GGCGGAG 77GAGCCGrCCGCGA7GCCGGGGC7CGAC7ACCGGGCCAGCCGCGGCGACGGC7CG FCCAGCCGCCCG7CAGCG7CGACCAGAGGC 



•KTCC-1 R. J 



AASTRQALRPRAOG?VGAAA£QVGGQsaLVSAQG 

^ccca^^ 

aaggg rtccatacccgccrgggcatccgtacccgccgfacgtggggagaagcccccgcagctttccctgc t *gc7gc77c77ca7gagcc77ccgcgccgc ' 



•HTCC-1 FL> 



S Q G M GGPVGMGGM HP SSGASKG77T 



'< '< Y S £ G A A 



O GCGGGCA C TGAA GA CG C CGA GCGCG CG CC AGTCGA AG CTGACGCGGGCGGTGGGCAAAAGGrGCTGGTACGA A A CGTCGTCT A A CGGCATGGCGAGCCA A 
•Jf CGCCCG TG AC TTC TGCGGC TCGCGCGCGG TC AGC TTCGAC TGCGCCCGC CACCCG TTTTCCACGACC A TGC TfTGCAGCAGA fTGCCGT A CCGCTCGGTT 

51 ' - ~ 



ui — ► 

fJJAGTEQAESAPVSAQAGGGQKVLVRNVV. 

SI 

ill ' " 

O 

o 

O 
O 



tonday, July 23. 1999 10:49 AM HlCx^-S {J-*J.jtj ^ 

TCC1(1 : 232) Map.mcd (1 > 726) Sita and Sac za 1 a ^ 

inzymes : 21 2 of 51 5 enzymes (Filtered) 

■artings: Linear, Csrtatn Sites Oft ty, Stardard Genetic Code 

A 7GCA 7CACCA 7CACCA7CACA 7GAGCAGAGC37 7C47CA7CGA 7CCAACGA 7CAG7GCC A77GACGGC7TG~7ACG«' Tr^^rr"! " rnr^ ir , r ,., 

< 1 i 1 ' ~ ' ; ' : _ r— |— , 

7ACG7AG7GG 7AG7GG 7AG7G 7AC7CG7C7CGCAAG 7iG7AGC 7 AGG i i GC 7AG 7CACGG7AAC 7GCCGA ACA 7GC r GGAAGACCCC 7 AACC77A TGGG 7 
M H H H H H H M S R A F I (OP 7 [ S A [ OGLYOLLG [ G [ P 

ACCAAGGGGG7A7CC777AC7CC7CAC7AGAG7AC77CGAA.UAGCCC7GGAGGAGC7GGCAGCAQCG777CCGGG7GA7GGC7^G77AGG7 T ^^GC'"GC 

' ! : : ! ~ — ; — : i. or\c\ 

7GG77CCCCCA7AGGAAA7GAGGAG7GA7C7CA7GAACCi i S i 7CGGGACC7CC7CGACCGTCG7CGCAAAGGCCCiC7ACCGACC4A7CCAAGCCGGC r - 
NQGGILYSSU£Yr£KAL££LAAAF? G05wLGSAA - 

GGACAA A 7A CGCCGGC A AAA ACCGCAACCACG7GA A 777777CCAGGA AC 7GGCAGACC 7CGA 7CG 7C AG C7CA7CAGCC7GA7C r ACGACCAGGC r A AC 

1 : ! ' " 1 ' ' " " " ~ — ■ _ qr\p 

CC7G77TA7GCGGCCG777T7GGCG77GG7GCAC77AAAAAAGG7CC77GACCG7C7GGAGC7i,GCAG7CGAG7AG7CGGAC7AGG7GC7GG7CCGG77' : ' 
0 K Y A G K N R N H V N F ? G E L A 0 U G R Q L ( 5 L ( H Q ^ Q A * 

GCGGTCC AG ACGACCCGCG AC A7CC 7GGAGGGCGCC A AGA A AGG 7C7CG A G77CG7GCQCCCGG7GGC7G7GGACC7GACC7AC A TCCCGG7CG7CGGG'" 

CGCCAGG7C7GCTGGGCGC7G7AGGACC7CCCGCGG77C7T7CCAGAGC7CAAGCACGCGGGCCACCGACACC7GGAC7GGA7G7AGGGCCAGCAGC^C^ ^ 
AVQTTRO {LcGA:<:<GL£FVR?7AV0L7Y{?yVG 

ACGCCC7ATCGGCCGCC77CCAGGCGCCG7777GCGCGGGCGCGA7GGCCG7AG7GGGCGGCGCGC77GCC7AC77GG7CGrGAAAACGC7GA7CAACGC 

i i t : < i r : ' i 1 : i i- : 1 £QQ 

7GCGGGATAGCCGGCGGAAGGTCCGCGGCAAAACGCGCCCGCGC7ACCGGCA7CACCCGCCGCGCGAACGGA7GAACCAGCAC7777GCGAC7AG77GCG 
HAL5AAFQAPFCAGAMAVVGGALAYLVVK7L £ N A 

GAC7CAAC7CC7CAAA77GCT7GCCAAA77GGCGGAG77GG7CGCGGCCGCCA77GCGGACA7CA777CGGA7G7GGCGGACA7CA7CAAGGGCA7CC7C 

« i 1 f 1 : ' « 5— : r- i r^600 

C7GAG7TGAGGAG77 t AACGAACGG777AACCGCC7CAACCAGCGCCGGCGG7AACGCC7G7AG7AAAGCC7ACACCGCC7G7AGTAG77CCCGTAGGAG 
TdLLKLLAKLAELVAAA [ A 0 [ \ 3 0 V A D I [ K G " f " L 

GGAGA AG 7G 7GGG AG 7TCA7C A CAA A CGCGC7C A ACGGCC7G A A AG AGC777GGGAC A AGC7CACGGGG7GGG7GACCGGAC7G77C7C7CGAGGG7GG 7 



CC7C77CACACCC7CAAG7AG7G777GCGCGAG77GCCGGAC777C7CGAAACCC7G77CGAG7GCCCCACCCAC7GGCC7GACAAGAGAGC7CCCACCA 
G E V W £ F [ TNALNGLKELW0KLT.-GWV7GLFSRGW 

CGAACC7GGAGTCCTTC7AAGAA77C 

' i i 726 

GC77GGACC7CAGGAAGA7TC77AAG 

S N L £ S F . £ F 



700 



✓Icnday, July 25, 1SS9 10:50 AM Af / CC - 1 (J 2 ~ 5"** j ^ & { 

tTCCl(ia^-392) Map.mpd (t > 631} Site and 5 :nca f 259 

•nr/mes ; 212 or ST 3 etrzymss (Fliiersd) 

jattincs: Linear. Certain Saas Qr.iv. Standard C-snaac Coda 



A TGCA TCACCA TCACCA7CACGA TGTGGCGGACA ~CA 7C A A GGGCA T CC TCGG AG A AG TG 7GGG AG 7 TT A "C^CA^ACGCGCTC k A-ACGGCC7GAAAG AGC 

1 1 ' ~" — 1 ■ ■■■■ — — j QQ 

T-XCG 1* AG TGG TAG TGG TAG TGC T AC ACCGCC TG T iGTAG T TCGCG TAGG AGCC TC TTC AC ACC CTCAAGTAGTG" rrGGGCGAG7TGCCGG AC TTTC 7CG 
rt H H H H H H 0 V A 0 E i !< G I L G £ V W £ r [ 7 N A I N G L K c 



T T TGGGACAAGC TCACGGGG TGGG TGAC CGGAC TG TTC TG TCGAGGG TGG TCGAACCTGGAG TGCT TCTTTGCGGGCGTCCCCGGC T TGACCGGCGCGAC 

I ; : : : ; : — u_ „ , 1 _ 

AAACCC TG 7 TCGAGTGCCCC AC CCAC TGGCC 7GA C A AG AGAGC7CCC AC CAGC77GGACC7CAGG A AGA A A CGCCCGCAGGGGCCGA AC 7GGCCGCGC TG ~ 



L W 0 K L TGWVTGU.-3 3 G W 5 N L £ 5 " P a 



G L 7 G A 



CAGCGGCTTGTCGCA AGTGACTGGC TTGTTCGGTGCGGCCGGTCTGTCCGCATCGTCGGGC TTGGC 'i"CACGCGGA7AGCC7GGCGAGC7CAGCCAGC77G 

GTCGCCGAACAGCGrTCACTGACCGAACAAGCCACaCCGGCCAGACAGGCGTAGCAGCCCGAACCGAGrGCGCCTArCGGACCGCrCGAGTCGGTCGAAC 
5GLSQVTGLFGAAGLSA5SGLAHAQSUASSASL 

CCCGCCC TGG CCGGCATTGGGGGCGGGTCCGGTTTrGGGGGCrrGCCGAGCCTGGCTCAGGTCCATGCCGCCTCAACTCGGCAGGCGCrACGGCCCCG AG 

GGGCGGGACCGGCCG7AACCCCCGCCCAGGCCAAAACCCCCGAACGGCTCGGACCGAGTCCAGGTACGGCGGAGTTGAGCCGTCCGCGATGCCGGGGCTC ^ 
PALAG [GGG3GFGGLPSLAQVHAA3TRQALRPR 

C 7GA TGGCCC GG TCGG CGCCGC TGCCGAGCAGG 7CGGCGGGC AG TCGCAGC 7GG7C7CCGCGCAGGG77CCC A AGG 7 a 7GGGCGGACCCG7AGGCA7GGG 

, 1 , ; ' < ' : 1 ! < r- f— ' 1 » j = l. 5 qq 

3 GACTACCGGGCCAGCCGCGGCGACGGCTCGTCCAGCCGCCCGTCAGCGTCGACCAGAGGCGCGTCCCAAGGGTTCCATACCCGCCTGGGCATCCGTACCC 
■AOGPVGAAASQVGGaSQLVSAGGSaGMGGPVGMG 

CGGC A TGC A C CCCTCTTCGGGGGCGTCGAAAGGGACGACGACGAAGA AG TACTCGGAAGGCGCGGCGGCGGGC AC TGAAGACGCCGAGCGCGCGCC A GTC 



GCCG 7ACG 7GGGGA GA AGCCCCCGC AGCT77CCC7GC TGC TGC 77C77C A 7GAGCC77CCGCGCCGCCGCCCG7GAC77CTGCGGC7CGCGCGCGG7C AG 
G M H P S S G A 5 K G7 77K K Y SE G A A AG7£QA£R A'P~V 

GAAGC7GACGCGGGCGG7GGGCAAAAGGTGCTGG7 ACGAAACG7CGTCTAACGGCGAAT7C 

1 f 1 1 ; f : [ 

C 7 7CG AC 7GCG CC CSC C AC CCG 7777CCACGACCA TGC TT7GCAGCAGA77GCCGC 774 AG 
SAQAGGGQXVLVRNVV.-R3 1 



fondav, July 25, 1935 10:^3 AM . n I ± \J - \ ^ \ 

.TCC1 (1-129) Map.MPD (1 > 411) Site anc qusncs / ; a S 3 

•nzymes : ' All 515 enzymes (No FvA*x, ' 
>^incs : Circular, Certain Sites Cniv, Standard Genetic Code 



ArGCArcACCATCACCArcACATGAGCAGAacG'TCArcArcGATCCAACGAiCAarGCCAr ^CGGcrrGrAasACcrrc rGGGGArrGr 

l— 1 ' r ■ — - — **-* ' ; " " 1 ! f • — ; i — - - i " t J 

TACG7AG7GGrAGTGG~AG7GTACrCGTCTCGCAAGrAGrAGCTAGGTTGC 'AG~CACGGrAACrGCCGAACA7GC rG r -AAGAC , ~CC7A''' 1 ^ 

MKHHHHHilSRAFJ ! Q P 7 [ S A 'GGLYCLLG t G 

AA7ACCCAACCAAGGGGG7A7CC777AC7CC7CAC7AGAG7AC77CGAAAAAGCCC 7GGAGGAGC7GGCAGCAGCG777CCGGG7GA7GCC7 
■. 1 : 1 , ; : 1 i i : 7" ^ ,q„ 

TTATGGGrTGGTTCCCCCATAGGAAATGAGGAGTGArcrCATGAAGCTTrrrCGGGACC-CCrcGACCG-CGrCGCAAA^ftrrCACrACCG" 
! P M Q G G tLY5SL£YFEKALE£LAAAFPG0G " 

GGTTAGG-TCGGCCGCGGACAAATACGCCC-GCAAiAACCGCAACCACGrGAAr-rrrrcCAGGWCTGGCAGACCiCGATCGTCAGCrCAiC 

CCAATCCAAGCCGGCGCCTGTTTATGCGGCCGTTTTTGGCGTrGGTGCACTTAAAAAAGGrcCT-GACCGrCTGGAGCrAGCAGTCGAGTAG ^ 
WLGSAAOXYAGXNRNHVNFFGEUAOLORQLt 

AGCCrGATCCACGACCAGGCCAACGCGGTCCAGACGACCCGCGACA-CCTGGAGGGCGCCAAG.iAAGGTCrCGAGrTCGTGCGCCCGGrGGC 

TCGGACTAGGTGCTGGTCCGGTTGCGCCAGGTcrGCTGGGCGCTGTAGGACCTCCCGCGGrTCTTTCCAGAGCTCAAGCACGCGGGCCACCG ^ 
SLtHOaANAVQrTROILEGAXKGLEFVRPVA 

U TG7GGACCTGACCTACA7CCCGG7CG7CGGGCACGCCC7A7AG 

yj — i . 1 ■ i 1 1 a. i ! 

ffi ACACCTGGACTGGATGTAGGGCCAGCAGCCCGTGCGGGATATC 
® V0L7YIPVVGHAL. 

ill 

!y 



o 

o 
o 
□ 



onday, July 23, 1SS9 1G:*o AM - }\ a f-jj Q C - 1 ? ^ y 



_^^^XCCt^?£:.^i > 1529) Sits and Sequerc 
nzymes: . Ail TTS^rrT^»^ ?i*taf) 
■actings: Linear. Certain Sites Gniv, 3;ancard Garlic Cede _ 

CA 7A7GCA 7CACCA 7CACCA7CACACGGCCQCG r CCGA 7AACT7CCAGC ~GTCCCAGGGTGGGCAGGGAT7Cg1cCA77CCGa7C'g:5GCAGGCGA "GGCu.A 

l. -■ - „.,■■{....-■ ,.„,,..,.,. - ■ _. „ . ^. , 

G7A7ACGTAGTGG7AG TGG TAG TG 7GCCGGCGCAGGC 7A77GAAGG7CGACAGGG 7CCCACCCG7CCC 7 AAGCGG7AAGGC 7AGCCCG7CCGC7AC r GCT 

r, i Met /HIS TAG- H^ 11 -'- ■ ■ ■ " Rat2 " 

H rt H H H H H H 7 A A 3 3 N F G L S Q G G Q G F A f p / q q a m a 

TCGCGGGCCAGA7CCGA7CGGG7GGGGGG7CACCCACCGT7CA7A7CGGGCC7ACCGCC77CC r CGGC 7~GGG7G "TGTCGACAACAACG^CAACGGCG- 1 " 

! ^ : 1 1 ' = — r— II I OC>Q 

AGCGC CCGG7CTAGGC7AGCCCACCCCCC AG 7GGG7GGCA AG 7 A 7AGCCCGGA7GGCGGAAGGAGCCGAACCCACAACAGC7G77G 77GCCG 7 7GC r GC^ - 



«Ha12: 



[AGO 1 R S G G G 3 ? 7 V H E G P 7 A F L G L G V v q 



N N G N G. A 



SCO 



A CGAG TC CA ACG CG 7GG 7C GG GAG CG C 7CCGGCGGCA A G7C7CGGC A 7C7CC AC CGGCGACG7G A 7CACCGCGG7CGACGGCGC7CCGA 7CAAC7CGGCC 
i i i ^ 1 5 ' i ^ , II l. 3QQ 

TGC7CAGGT7GCGC AC CAGCCC7CGCGAGGCCGCCQ77C AG AGCCG7AGAGG7GGCCGC7GC AC 7AG7GGCGCCAGC7GCCGCGAGGC7AG77GAGCCG3 
i" i i ii * Ra12 " 

RVQRVVGSAPAA3LG I S7GCV ETAVOGAPCNSA 

ACCGCGA7GGCGGACGCGC77AACGGGCATCA7CCCGG7GACG7CATC7CGG7GACC7GGCAAACCAAG7CGGGCGGC^CGCG"ACAGGGAACG7^ACA7 
' i < i ' i * i ' « < i i ^ .-oo 

7GGCGC T ACCGCCTGCGCG AA 7TGCCCG TAG 7AGGGCCACTGC AG7AGAGCC AC TGG A CCG777GG77CAGCCCGCCG7GCGCA7GTCCC77GC AC 7GTA 

^"APtA0ALNGHHPG0VE3VTWGr f <SGGTRTGNVr 
ffiTGGCCGAGGGACCCCCGGCCGAA77CCTAG7ACC7AGAGG7TCAA7GAGCAGAGCG77CA7CA7CGA7CCAACGA7CAG7GCCA7TGACGGC77G7 A CGA 

' i i i i * 1 i ■ : , : 1 u^OQ 

^'ACCGGCrCCCTGGGGGCCGGC7TAAGGA7CATGGA7C7CCAAG77AC7CGTC7CGCAAG7AG7AGC"AGG77GC7AG7CACGG7AAC7GCCGAACA-7GC7 
^ _ Ra12 i rccaRI j Thrombin | hTCCl - 

?|jLAEGPPAEFLVPRGStt3RAF t t 0 P 7 i 3 A V Q G I ? ~ 0 

f _ CC7TC7GGGGATTGG A AtACCCAACC A AGGGGG7ATCC7TTiC7CC7CAC7 AG AGTACT7CG A AAA AGCCCTGGAGGAGC7GGCAGCAGCGTT7CCGGG7 
1 '■ * s 1 — 1 — i ■ = '■ ' r : ; ~ : — ■ ~ -i 1 , ^ 

Q GGAAGACCCC7AACCTTA7GGG7TGGT7CCCCCA7AGGAAATGAGGAGTGArC7CA7GAAGC77TT7>GGGACC7CC7CGACCGrCG7CGCAAAGGCCCA 

, hT rr ^, 

P LLG [ G tPNGGGILYSSL£YrcKAL c ELAAAf r PG 

a 

GATGGCTGG TTAGG T7CGGCCGCGGACA A A TACGCCGGC A AAAACCGC A ACC ACG TGA A TTTTTTCCAGG A AC 7GGCAGACC7CG A TCG7CAGCTCATCA 

*** • 1 1 1 1 * ' i ■ i i * i 1 . , f- 700 

C TAC CG A CC AA TC C A AGC CGGCGCC 7G 777A 7GCGGCCG 77 7 TTGGCG 77GG 7GC A C TTAAAAAAGG7CCTTGACCGTC TGG AGCT A GC A GTC GAG TAG 7 
m i hTCCl ■ 

DGWLGSAAOKYAGKNRNHVNFFQELAQLQRQL t 

GCCTGA7CC ACG AC CAGGCC A ACGCGGTCCAGACGACCCGCGACA7CCTGGAGGGCGCCA AG A A AGG7CTCGAG77CG7GCGCCCGG7GGC7G7GGACCT 
' 1 ' ! ' i ' i ! 5 1 ' H i , { < h 300 

CGGAC7AGG7GC7GGTCCGGTTGCGCCAGGTCTGCTGGGCGCTGTAGGACC7CCCGCGGTTCTTTCCAGAGC7CAAGCACGCGGGCCACCGACACC7GGA 
■ i ■ ■ i ii hTCCl - 

S L I HO QANAVQT^TRQ I LEGA.K K G LEFVRPVAVOL 
GACCTACATCCCGGTCGTCGGGCACGCCCTATCGGCCGCCTTCCAGGCGCCGTTTTGCGCGGGCGCGATGGCCGTAGTGGGCGGCGCGCTTGCCrACTTG 

i i 1 i i f ! i 1 i . : i , ; , 1 , h 900 

CTGGATGTAGGGCCAGCAGCCCGTGCGGGATAGCCGGCGGAAGGTCCGCGGCAAAACGCGCCCGCGCTACCGGCATCACCCGCCGCGCGAACGGATGAAC 

I hTCCl ^mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm^ 

TY IPVVGHALSAAFGAPFCAGAMAVVGGALAYL 



cnday, July 25 , t S99 1 0:^5 AM ^ . =32 
;(Tnr)hTCC1.nncd f1 > 1623) Site and Sacue^c ' ' 3ge 

17CG7GAAAACGC 7GA7C A A CGCGA C 7C A AC i\CC IC AAA 7 7GC 7 TGCC AA A77GGCGGAG 77GG 7CGCGGCCGCCA 77GCGGACA 7CA777CGGAT3 7GG 
*AGCA C 7 77 TGCG AC 7 AG 7 7GCGC 7GAG7TGAGGAG777AACGAACGG777AACCGCC ~CA A CCAGCGCCGGCGG7AACGCC7G7a'g7A A AGCC 7 AC AC C 

J i n . u,. ,,., ,, , , , , r . - i. nr »l n r < ■ .■ ■■ rSCC* - - ■ . 

V V ;< 7 L. I N A 7 G I I X L L A K L A £ L V A A A [ a Q { r S Q v 
CGGACA 7C A TC A A GGGC A 7CC 7CGGAGAAG7G7GGGAG77CA ~CACAAACGCGC7CAACGGCC7GAAAGAGC777GQGAtAAGC7C ACGGGG7GGG7GAC 

GCC7G7AG7AG77CCCG7AGGAGCC7Cr7CACACCC7CAAGrAG7Gr77GCGCGAG77GCCGGAC777C7CGAAACCCrGr7CGAGrGCCCCArrcAC7G ^ 

' i ■ ■ ■ ■ HTCC1 11 

A 0 ( i K G E U G £ V W £ F [ 7 N A L N G L ;< £ l W 0 X L 7 G W V 7 

CGGAC7G77C7C7CGAGGG7GG7CGAACC7GGAG7CC7TC777GCGGGCG7CCCCGGC77GACCGGCGCGACCAGCGGC77G7CGCAAG7GAC7GGC77G 

GCC7GACAAGAGAGC7CCCACCAGC77GGACC7CAGGAAGAAACGCCCGCAGGGGCCGAAC7GGCCGCGC7GG7CGCCGAACAGCG77CAC7GACCGAAC 
i i ■ i ■ i ■ i i r NTCC 1 ■ 

GLFSRGWSNLE5FFAGVPGL7GA7SGISGV7GL 
7TCGG7GCGGCCGG7C7G7CCGCA7CG7CGGGC77GGC7CACGCGGA7AGCC7GGCGAGC7CAGCCAGC77QCCCGCCC7GGCCGGCA77GGGGGCGGG7 

AAGCCACGCCGGCCAGACAGGCG7AGCAGCCCGAACCGAG7GCGCC7A7CGGACCGC7CGAG7CGG7CGAACGGGCGGGACCGGCCG7AACCCCCGCCCA 
i 'I i ■ ■ ■ i ■ ■ hTCCl ■ 

fFGAAGLSA55GLAHAQSLA3SASLPALAG [ GGG 

1CCGG7777GGGGGC77GCCGAGCC7GGC7CAGG7CCA7GCCGCC7CAACTCGGCAGGCGC7ACGGCCCCGAGC7GA7GGCCCGG7CGGCGCCGCTGCCGA 

J 1 * 1 ' ' ' ' ' ^ 1 * f ' \r> 14QO 

GGCCAAAACCCCCGAACGGC7CGGACCGAG7CCAGG7ACGGCGGAG77GAGCCG7CCGCGA7GCCGGGGC7CGAC7ACCGGGCCAGCCGCGGCGACGGC7 
i ■ i ' hTCC1 

5G F G G L P 5 L A G V H A A 5 7 3 Q A L R? R A 0 G P V G A ' A" A " £ 
GCAGG7CGGCGGGCAG7CGCAGC7GG7C7CCGCGCAGGG77CCCAAGG7A7GGGCGGACCCG7AGGCA7GGGCGGCA7GCACCCC7C77CGGGGGCG7CG 



CGrCCAGCCGCCCG7CAGCGTCGACCAGAGGCGCG7CCCAAGGG77CCA7ACCCGCC7GGGCA7CCG7ACCCGCCG7ACG7GGGGAGAAGCCCCCGCAGC 
, i hTCCl 

QVGQQSQLVSAaGSQGMGGPVGMGGMHPSSGAS 
AAAGGGACGACGACGAAGAAG7AC7CGGAAGGCGCGGCGGCGGGCAC7GAAGACGCCGAGCGCGCGCGAG7CGAAGC7GACGCGGGCGG7GGGCAAAAGG 



1 500 



TTTCCCTGCTGCTGCTTCTTCATGAGCCTTCCGCGCCGCCGCCCGTGACTTCTGCGGCTCGCGCGCGGrCAGCTTCGACTGCGCCCGCCACCCGTTTTCC 
■ i i ii. > i ■ i i hTCCl ■ 

KGTTTKKYSEGAAAGTeOAeRAPVEAOAGGGQK 

7GC7GG7ACGAAACG7CG7C7AAGAA77C 



-4- 1600 



1629 



ACGACCA7GC777GCAGCAGA7TC77AAG 



» hTCCl I jlcoRij 
V L V 3 N V V . £F 



huraday, July 22. 1399 1:33 ?M ^ Hl£C~L (T>'T H 

TCCl(-T0.1).mcd (1 > 1225) Site and Sequen. ^ V ' ~J ' 

'nzymas : 2 of 515 enzymes (rri tared) 

■ettincs: Linear, Certain Sites Only. Standard Generic Coda • 

CA 7 A TGCA TC AC C A TCACCA fCACA TGAGCAGAGCG TTCA "CA TCGA FCCAACGA f CAG TG CC A TTG A CGGCfTG 7ACGACC7 ft-f GGGGA TTGGA % fAC 

GTA "ACGTAGIGG" AG'GG FAGTGTAC TCG TCTCGCA AG TAG 7AGC TA GG7TGC TAG TCA CGGTA A CTQCCGA A CA7GCTGGAAGACCCCTAACC "A T^ 

) Met / HIS TAG » ■ » ■ ■ ■ ■■■> «**»■ iiTCCl _~ 

■HKHHHHHHMSRAFI I G P T t S A (DGLYQLLGE 



I 



CCA ACCAAGGGGGTA TCC T TTAC TCCTC AC TAG AG TAC rrCGAAAAAGCCCTGGAGGAGC TQGCAGCAGCGTT7CCGGG7GA 7GGC TGGT~AGGTTCGG^* 

5 ' ' ' : ' = ! 1 t ■ : I j nn 

GG TTGGTTCCCCCATaGGAAATGAGGAGTGATC TCA TGAAGCTTT TTCGGGACCTCCTCG A CCGTCGrCQCA A AGGCCCAC7AC r GACCAATCCAAGCCG - 
P N 0 G G t L Y S 3 L £ Y r £ K A I £ £ L A A A r ? G 0 G W L G S A 

CGCGGACAAA TACGCCGGCAA A AACCGCAACCACGTGA A TTTTTTCCAGGAAC TGGC AGACC TCGA TCG TCAGC TCA TCAGCCTGATCC A CGACCAGGCC 

GCGCCTGTTTATGCGGCCGTTTTTGGCGTTGGTGCACTTAAAAAAGGTCCTTGACCGTCTGGAGCTAGCAGTCGAGTAGTCGGACTAGGTGCTGGTCCG^ ^ 
■ '■ ii i i i i hTCCl - 

AOKYAGKNRNHVN?'PQSUA0L0f?aLtSi, [HOGA 

AACGCGGTCCAGACGACCCGCGACATCCTGGAGGGCGCCAAGAAAGGTCTCGAGTTCGTGCGCCCGGTGGCTGTGGAC* TGAC r TACAT" r C r GTCGTCG 
i 1 : , 1 > 1 ■ , : : ; s L 1 T" " L-j. 400 

TTGCGCCAGGTCTGCTGGGCGCTGTAGGACCTCCCGCGGTTCTTTCCAGAGCTCAAGCACGCGGGCCACCGACACCTGGACTGGAiGTAGGGCCAGCAGC 
D 1 1 1 1 1 " ' hTCCl * 

CiNAVGTTRO IL£GAK.<GLeFvaPVAVQLTYfPVV 

fin 

S GGCACGCCCTATCGGCCGCCTTCCAGGCGCCGTTTTGCGCGGGCGCGATGGCCGTAGTGGGCGGCGCGCTTAAGCTTGCCTACTTGGTCGTGAAAACGCT 

»S5 1 ! 1 ' 1 I ' i 1 : 1 1 : ' H -j 1 . . r-QQ 

J£ CCG TGCGGG A TAGCCGGCGGAAGGTCCGCGGCA A A ACGCGCCCGCGCTACCGGC A TCACCCGCCGCGCGA A TTCGAACGGATGAACCAGC AC TTTrGCG A '" 

y " ■hTccv i p^gr 




e GA TCA ACGCGAAGCTT A CTCA AC TCCTCAA A TTGCTTGCCAAATTGGCGGAGTTGGTCGCGGCCGCCATTGCGGAC A TCA TTTCr > GATGTG'"CGGACAT r 

H; , i ■ . 1 ' i ' i ' i i i , jJL- , 1 600 

g CTAGTTGCGC TTCGAA TGAGTTGAGGAGTTTAACGAACGGTTTAACCGCCTCAACCAGCGCCGGCGGJAACGCCTGTAGTAAAGCCTACACCGCCTGTAG 
jIT »OeL£T£D j [Hind3 f ' " 11 ' 1 ■ hTCCl - 

O [ NAKLTQLLKLLAXLA£LVAAA I a 0 C I S 0 V A 0 E 

S ATCAAGGGCATCCTCGGAGAAGTGTGGGAGTTCATCACAAACGCGCTCAACGGCCTGAAAGAGCTTTGGGACAAGCTCACGGGGTGG^TGACCGGA^TGT 

' I ■ 1 ! « ' 1 ! ' = ' : i : L-+ 700 

TAGTTCCCGTAGGAGCCTCTTCACACCCTCAAGTAGTGTTTGCGCGAGTTGCCGGACTTTCTCGAAACCCTGTTCGAGTGCCCCACCCACTGGCCTGACA 



< hTCCl i 



iKG CLGSVWEFCTNALNGLKELWDKLTGWVTGL 

TCTC TCGA GGGTGGTCGAACCTGGAG TCC TTCTTTGCGGGCGTCCCCGGCTTGACCGGCGCGA CCA GCGGCTTGTCGCA AG TGACTGGCTTGTTCGGTGC 
' i i < 1 1 i ■ i ' r— i— . , , f , . r 300 

AGAGAGCTCCCACCAGCTTGGACCTCAGGAAGAAACGCCCGCAGGGGCCGAACTGGCCGCGCTGGTCGCCGAACAGCGTTCACTGACCGAACAAGCCACG 

I I I I ■ I hTCCl ^''''^'^'■'■^■■■■■■■■■■^■■■■MnNMHHMHMHHMMMMMI 

F SR GW S H L£S. e F w AGVPGL T G A T3GLSQV TGLFGA 

GGCCGGTCTGTCCGCA TCG TCGGGCTTGGC TCA CGCGGATAGCCTGGCGAGC TCAGC CAGCTTGCCCGCCCTGGCCGGCATTGGGGGCGGGTCCGGTTTT 
1 i 1 ■ i ! 1 i 1 i . ■ « f . 1 , ( , f. 900 

CCGGCCAGACAGGCGTAGCAGCCCGAACCGAGTGCGCCTATCGGACCGCTCGAGTCGGTCGAACGGGCGGGACCGGCCGTAACCCCCGCCCAGGCCAAAA 

■ i i hTCCl ---------- 

AGL5ASSGUAHA03UASSASL?AUAGtGGGSGF 



nursday, July 22, 1 S99 1:35 PM _ ^_ a , 

rCGlf-TD.D.fncd (1 > 122S1 Sita and Sxxssr. . ' 

:GGGGCTTGCCGAGCCiGGC7CAGG7CCA7GCCGCC7CAAC7CGGCAGGCGC7 *CGGCCCCGAGG7GA TGGCCCGG7CGGCGCCGC7GCCGAGC* GG7~G 

■ i i 5 i ' r ^ ! Z 1. |C p 0 

CCCCCGAACGGC TCGGACCGAGTCCAGGTACGGCGGAGTTG-GCCGf CCGCGA 7GCCGGGG C7CGAC 7ACCGGGCCAGCCGCGGCGACGGC 7CG7CCAGC 
h i l i . ■ ■ i " ■ i hTCCI ■" ■ 

GGUPSLAQVHAASTaGALRPSAOGPVGAAASQV 

GCGGGCAG rCGCAGCTGGTCTCCGCGCAGGG^rCGCAAGG ~iTGGGCGGACGCGTAGGCATGGGCGGCATGCACCCCTC~7^GGGGGCGrCG A * A GGGA'" 
■ i : 5 i : i : 1 — 1 ] , QQ 

CGCCCGTCAGCGTCGACCAGAGGCGCGTCCCAAGGGr"CCArACCCGCCTGGGCAiCCGrACCCGCCG7A;G7GGGGAGAAGCCCCCGCAGCTTTCCCTG 
■ i ■ ■ i i hTCC 1 ■ ■ 

GGCSGLV SAGGSGGf1GG?VGi1GGriH?sSGASKG 7 

GACGACGAAGAAGTACTCGGAAGGCGCGGCGGCGGGCACTGAAGACGCCGAGCGCGCGCCAGTCGAAGCrGACGCGGGrGGrGGGCAAA^GGrGCrGC TA 

. 1 • i • i i ' J s • ' r 1 . u 1200 

CTGCTGCTTCTrCATGAGCCTTCCGCGCCGCCGCCCGrGACTTCTGCGGC rCGCGCGCGG FCAGC77CGACTGCGCCCGCCACCCG7TTTCCACGACCA7 

1 h7CC1 

'TTKKYS£GAA4GT£0A£RA?V£AOAGGGQKVLV 
CGAAACG rCGTCTAACGGCGAATTC 



GCrTTGCAGCAGATTGCCGCTTAAG 
wmmm hTCCt *— i j £co3( [ 
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-riday, July 23, 1S99 AM H I COl ( / ft -JL ) Q 

iTCC1(f!)-TM2 Map.MPO (1 > 1225) Sits f ' Sequence /"7 ' a S s 1 

Enzymes : 1 cf'515 anrymes (Filtered) f \ 

Settings : Circufar, Certain Sites Only, Standard Genetic Cods 

CA7a7GCa7CACCA7CACCA7CACA7GAGCAGAGCG77CA7CA7CGA7CCAACGA7CAG7QCCA 7-7 G A -GGC 7 7G7ACGAC''" 77C7^n.GfiA 

j , ' * ' : 1 1 ~ 5 - ■ -i ! — j ; : — Q ( 

G7A7ACG7AG7GG7AG7GG7AG7G! AC7CGrc7CGCAAG7AG7AGC7AGGTTGC7AGTCACGG7AACrGC:GAACA7QC7GGAAG^CrcC7 

J - s\..\\\wi^Bmmmmmmmmmmmm wmmhm . "flTCCl 1lin " n m " ■ ■■■ „„ , - ium 

H M H H H H H H M 5 3 A F I [ Q P 7 { S A £ 0 G L Y 0 L L G 

T7GGAA7ACCCAACCAAGGGGG7A7CC777iC7CC7CAC7AGAG7AC77CGAAAAAGCCC7GGAGGAGC7GGCAGCAGCG7 77C^-GG7^ A 

AACC7TA7GGG77GG77CCCCCA7AGG AAATGAGGAG7GA 7C7CA7GAAGC 7777 7CGGGACC7CC7CGACCG7CG7CGCAA4GGCCCA r ] 

hTCCt ■ ' 1 



t32 



IGIPNQGGILYSSLEYFEKALSeLAAAFPGO 

7GGC7GG77AGG 77CGGCCGCGGACAAA 7ACGCCGGCAAAAACCGCAACC 4CG 7GAA777777CCAGGAAC 7GGCAGACC 7 r GA7 r G7CAG 

' ; f ! * 1 ' 1 5 ' 1 ' i ■ ~ Z { 1 !l 213 

.ACCGACCAArCCAAGCCGGCGCC7GT77A7GCGGCCG77T77GGCG77GGTGCAC77AAAAAAGG7CC77GACCG7C7GGAGC7AGCAG7C 

'nT^l ~ 11 

GWLGSAADKYAGKNRNHVNFrQ£i. A o L Q RG 

C7CA7CAGCC7GA7CCACGACCAGGCCAACGCGG7CCAGACGACCCGCGACAAGC77A7CC7GGAGGGCGCCAAGAAAGG7C7CGAG7TCG 
— « 1 1 i 1 1 1 i * ; 1 i . — j 1 1 ( : 

1i GAG7AG7CGGAC7AGG7GC7GG7CCGG77GCGCCAGG7C7GC7GGGCGC7G77CGAA7AGGACC7CCCGCGG77C777CCAGAGC7CAAGC 
j hTCC1 l |Hjng3 j 1 DELEJEQ 

t L t SL I MQQANAVQ77R0KLI LEGAKKGLEF 

1 7GCGCCCGG7GGC7GTGGACC7GACC7ACA7CCCGG7CG7CGGGCACGCCC7A7CGGCCGCC77CCAGGCGCCG7777GCGCGGGCGCGAT 

1" ' ; ' 1 ' ' ! 1 1 1 ' 1 : • ' 1 ■ 'r—- 455 

j ACGCGGGCCACCGACACCTGGAC7GGA7G7AGGGCCAGCAGCCCG i G C GGG A 7 AG CCGGCGG A AGG7CCGCGGCA A AACGCGC C CGCGC 7A 

VRP VAVOLTYlPVVGHALSAArQAPFCAGAM 
GGCCG7AG7GGGCGGCGCGC77GCC7AC77GG7CG7GAAAACGC7GA7CAACGCGAC7CAAC7CC7CAAA77GC77GCCAAA77GGCGGAG 



546 



CCGGCA7CACCCGCCGCGCGAACGGA7GAACCAGCAC t 777GCGAC 7AGT7GCGC7GAG77GAGGAG777AACGAACGG777AACCGCC7C 
DELETE 0 ■ 

AVVGGALAYLVVK7L E N A 7GLLKLLAKLAE 

77GG7CGCGGCCGCCA77GCGGACA7CA777CGGA7G7GGCGGACA7CA7CAAGGGCA7CC7CGGAGAAG7G7GGGAG77CA7CACAAACG 

— H f i f i • ' « ( 1 ? i i 1 f 1 1 1 637 

A ACC AGCGCC GG CGG 7 A ACGCC7G7AG7AAAGCC7ACACCGCC 7 G7AGT AG 77CCCG7AGGAGCC7C77CACACCC7CA AG 7AG7G 7 7 7GC 
■ DELETED 

LVAAAIAOIISGVAOI EKGILGEVWEFITN 

CGAAGC77C7CAACGGCC7GAAAGAGC777GGGACAAGC7CACGGGG7GGG7GACCGGAC7G77C7C7CGAGGG7GG7CGAACC7GGAG7C 

i— i 1 t f 1 H 1 ! ' i 1 ! 1 i 1 f 1 728 

GC7TCGAAGAG77GCCGGAC777C7CGAAACCC7G77CGAG7GCCCCACCCAC7GGCC7GACAAGAGAGC7CCCACCAGC77GGACC7CAG 

Hl l Htnd3 | ■ r ■ hTCC1 ' 

AKLLiNGLKELWD'KLTGWVTGLFSRGWSNLES 

CTTCTfTGCGGGCGTCCCCGGCTTGACCGGCGCGACCAGCGGCTTGTCGCAAGTGACTGGCTTGTTCGGTGCGGCCGGTCTGTCCGCATCG 

•H , i , 1 i \ 1 1 1 f 1 i 1 1 1 1 1 819 

GAAGAAACGCCCGCAGGGGCCGAAC7GGCCGCGC7GG7CGCCGAACAGCG77CAC7GACCGAACAAGCCACGCCGGCCAGACAGGCG7AGC 

■ 11 ■ hTCC1 ■ 

FFAGVPGLTGATSGLSQVTGLFGAAGLSAS 



Friday, July 23, 1399 3:*1 AM Q 
hTCCl (ffl-TM2 Mac.MPO f*> 1225) Sita - ; Secuerrca _____f 

7CGGGC77GGC7C ACGCGGA7AGCC 7GGCGAGC7CAGCCAGC77GCCCGCCC "GGCCGGCA 77GGGGGC f ~^G7^ r -G ~7 r 7G r G^~C r 7'*^ 

-i ■ ! ■ i 1 ? i ' i : ~- ! : ~ ~ J ' sio 

AGCCCG A ACCG AG 7GCGCC TATCGGA CCGCTCGAGfCGG "CGAACGGGCGGGA CCGGCC G T AACCCCCGCCCAGGCC AAA A CCCCCG A ACG 
■ ■" 1 1 hTCCl ~ 

3GLAHA0SLA5SASLPALAG i GGGSGPGGL 
CGAGCC 7GGC 7CAGG7CCA7GCCGCC7CAAC7CGGCAGGCGC 7ACGGCCCCGAGC 7GA 7GGCCCGG7CGGCG r CG r 7GC r GAGCAGG7C p ~^ 

; i ; : 1 ! * ' * ■ . I ■ 1: (coi 

GC7CGG ACC G AG 7CC AG G 7 AC GGCGGAG77GAGCCG7CCGCG A 7GCCGGGGC 7 CG AC "ACCGGGCCAGCCGCGGCGACGGC 7CG7CCAGCC 
■ ■ hTCCl ■ 

PSLAQVHAAS7RGALPPRAQGPVGAAAEGVG 

CGGGC AG7CGCAGC7GG7C7CCGCGCAGGG77CCCAAGG7A7GGGCGGACCCG7AGGCA7GGGCGGCA7GCACCCC7C77CGGGGGCG7 r G 

i ■ 1 i ' * s S 1 * ' I ' i . i , l!_ l0 go 

GCCCG7CAGCG7CGACCAGAGGCGCG7CCCAAGGG77CCA7ACCCGCC7GGGCArCCG7ACCCGCCG7ACG7GGGGAGAAGCCCCCGCA^C 
' ■ " ■ ■ hTCCl " ■ 

GGSQLVSAQGSGGMGGPVGrtGGMHPSSGAS 

AAAGGGACGACGACGAAGAAG7AC7CGGAAGGCGCGGCGGCGGGCAC7GAAGACGCCGAGCGCGCGCCAG7CGAAGC7GACGCGGGCGG7G 

* i . \ i ! ' i S < 1 ■ 1 , h , ; - t 183 

7TTC CC TGC 7GC TG C 7TC 7 7C A 7G AGC C T7C CGC GCCGCCGCCCG 7G AC 7 7C7GCGGC7CGCGC GCGG7CAGCT 7 CG AC 7 GCGCCCGCC AC 
□ ■ hTCCl 1 

i^KG 7 7 TXKYSEGAAAG7EOAERAPVEAOAGG 
S GGCA AAAGG7GC7GG7ACGAAACG7CG7C7AACGGCGAA77C 

OJ ! . 1 1 j . f \22S ' ' 

£0 CCG7777CCACGACCA7GC 777GCAGCAGA77GCCGC77AAG 
H i ' hTCCl- '" ■■ I IScaR 

m — 



G Q K V L V R N V V R R I 
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Monday. Juiy 25,1995 3:25 PM _ rr /CC-^d O My- „u , ' "~ ?a CS , 

ht(134-392)-HS-hl(l-129).mpd (1 > 2232) Si' d Sequence 
Enzymes : 3 of 515 enzymes (Filtered) 

Settings: Linear, Certain Sites Cniv. Standard Genetic Cede 

CA 7A 7GCA7C A CCATC ACC A 7C A CCA TG7GGCG3ACA 7C A 7CAAGQGC A 7CC 7CGG AG A AG 73 "'GGGAG77CA 7CAC AAACGCGCTCAACGGCC 7GAAAG 

j 1 : : . : : : i ; ■ , CQ 

GTA 7ACG 7 AGTGG TAG TGG TAGTGCTACACCGCC 7G 7 AG7 AG77CCCG~AGG AGCCTCT'CACACCC ~C A AG ~AG ? G 77 7GCGC GAG T7GCCGG AC TTTC 



« Met / HIS TAG" -— H I ) ' r 1 ' hTCCl (15^-392; j 



H n H H H H H H 0 V A 0 I i K G I L G E V W Z F { 7 N A L N G L K 
AGC77TGGGACAAGC TC A CGGGG7GGG7GACCGGAC7G~7C 7C 7CGAGGG TGG 7LGAACC7GGAG7CC7TCT77GCGGGCG7 rr CCGGC "TGACCGG'^C 

■ ■ ! i : i i i — : i 000 

7CGAAACCCTG7TCGAG7GCCCCACCC AC "GGCC 7GACAAGAG AGC 7CCC ACC AGC 7 TGG ACC TCAGGAAGAAAC3CCCGCAGGGGCCGA AC 7GGCCGCG * 



'hTCCl (134-392)' 



ELWQKLTGWVTGLPSSGWSNLESFFAGVPeL TGA 

GACCAGCGGCTTGTCGCAAGTGACTGGCT7GTTCGG7GCGGCCGGTCTGTCCGCArCGTCGGGC7TGGC7CACGCGGATAGCCTGGCGAGCTCAGC r AGC 
1 i ■ i 1 i 5 i i r— : j , , 1. 200 

CTGGTCGCCGAACAGCGT7CACTGACCGAACAAGCCACGCCGGCCAGACAGGCGTAGCAGCCCGAACCGAGTGCGCCTA7CGGACCGCTCGAG7CGGTCG 

TSGLSQVTGLFGAAGL5A3SGLAHAQ3LASSA5 

TTGCCCGCCCTGGCCGGCA T7GGGGGCGGG7CCGG7777GGGGGC7TGCCGAGCC 7GGC7CAGG7CCA7GCCGCC7CAAC7CGGCAGGCGCTACGGCCCC 
1 1 ■ i ■ i i . i ' ■ 1 ; ■ ! , r iiQO 

~ AACGGGCGGGACCGGCCG7AACCCCCGCCCAGGCCAAAACCCCCGAACGGCTCGGACCGAG7CCAGGTACGGCGGAGT7GAGCCGTCCGCGATGCCGGGG 

■ ■ ' hTCC 1 ( 1 84-392) ■ 

'JLPALAG I GGG3GFGGLPSLAGVHAA3TRQALRP 

G fl GAGC TG A TGG CCC GGTC GGCGCCGC 7GCCGAGC AGGTCGGCGGGC AG TCGCAGC TGG TCTCCGCGCAGGGTTCCC A A GGTA7GGGCGG AC CCGTAGGC AT 

IB ' i ! ' f ' ■ ' ! i r ■ i 1 ^500 

pf| CTCGACTACCGGGCCAGCCGCGGCGACGGCTCGTCCAGCCGCCCGTCAGCGTCGACCAGAGGCGCGTCCCAAGGGTTCCATACCCGCCTGGGCATCCGTA 

Sj — mmmm—mmmmmmmmm hTCCl (1 34-392) ' 

%£ R A 0 G P V G A AA-EG V GG G 3 Q L V SA QGSQG'm'G G F V " G M 

m 

GGGCGGCATGCACCCCTCTTCGGGGGCGTCGAA4GGGACGACGACGAAGAAG7ACTCGGAAGGCGCGGCGGCGGGCAC7GAAGACGCCGAGCGCGCGCCA 
f , i i 1 i ' i . i . : \ , , 1 1 1 600 

H 8 CCCGCCG7ACGTGGGGAGAAGCCCCCGCAGCTTTCCCTGCTGCTGCTTCTTCATGAGCCTTCCGCGCCGCCGCCCGTGACTTCTGCGGCTCGCGCGCGGT 

C ' hTCC 1 ( 1 84-392) ■' ' ■■ i r 

^ GGMHPSSGASKGTTTKKYSEGAAAG7EDAERAP 

B 

fl; G7CGAAGCTGACGCGGGCGGTGGGCAAAAGGTGCTGGTACGAAACGTCGTCGAATTCATGGTGGA7TTCGGGGCGTTACCACCGGAGA TCAACTCCGCGA 

1 i i 1 i 1 i 1 i 1 j 1 H , 1 , j_ 7 00 

C AGC TTCG ACTGC GCCC GCC A CCCG TT TTCC ACG ACC A 7GCTTTGCAGC AGC TTA A GTACC AC CTAAAGCCCCGCA A TGGTGGCCTCTAGTTGAGGCGCT 

i i i hTCCl (184-392? n ■ ^ [ScoRi j TbH9 

VEAOAGGGQXVLVRNVVEFMVOFGALPPEINSA 

GGATGTACGCCGGCCCGGGTTCGGCCTCGCTGGTGGCCGCGGCTCAGATGTGGGACAGCGTGGCGAGTGACCTGTTTTCGGCCGCGTCGGCGTTTCAGTC 
, j , f 1 i , j , 1 1 \ : . 1 l j , r 800 

CCTACATGCGGCCGGGCCCAAGCCGGAGCGACCACCGGCGCCGAGTCTACACCCTGTCGCACCGCTCACTGGACAAAAGCCGGCGCAGCCGCAAAGTCAG 

■ ■ " rt - 1 -'" 

Rli Y AGP G SASLV^AAAQMWO 5 VA SOLFSAASAFGS 

GGTGGTCTGGGGTCTGACGGTGGGGTCGTGGATAGGTTCGTCGGCGGGTCTGATGGTGGCGGCGGCCTCGCCGTATGTGGCGTGGATGAGCGTCACCGCG 
1 1 , 1 1 j , j j ( , : } , • ( 1 f ' r 900 

CCACCAGACCCCAGACTGCCACCCCAGCACCTATCCAAGCAGCCGCCCAGACTACCACCGCCGCCGGAGCGGCATACACCGCACCTACTCGCAGTGGCGC 
■ i " ■ ' ■! ■ " TbH9 i i j i i i n 

VVWGLTVGSWIGSSAGLMVAAASPYVAWMSVTA 



o 



lid. 10 

>$Lt^t I o£ 3 



Monday, July 25, 1999 3:25 PM 

W3^352^HS-mflM29).^cc ft > 2232) Site . Secusnca 



r 



Pags 2 



GGGCAGGCCGAGC7GACCGCCGCCCAGG7CGGGG77GC7GCGGCGGCC7ACGAGACGGCGTA ~00GC7GACGG~GCCCCCGCCGG7GA 7CGCCG4GAACC 

CCCGTCCGGC TCGACTGGCGGCGGG 7CCAGGCCCiACGACGCCGCCGGA7GC7C 7GCCGCA 'ACCCGAC 7GCCACGGGGGCGGCC AC 7 AGCGGC7C 7 7GG 
. . . l T uun n | m iMmmmm , 



G Q A £ L 7 A A Q V R V A A A A Y £ T A Y G L 7 7 ? 



? P V [ A £ N 



G 7GC 7GAAC 7GA 7GA77C 7GA7AGCGACCAACCTC77GGGGCA^AACACCCCGGCGA7CGCGG7CAACGAGGCCGAA "AC-G r G ' ^A 7GTG^GC r CAAG A 
: 1 : f ; : 5 , r " " J ^ , lG0 

CACGAC "TGAC 7AC7AAGAC7A7CGC 7GG 7 7GGAGAACCCCG 777 7G7GGGGCCGCTAGCGCC AG 77GC7CCGGC '7A 7QCCGC7C 7A C AC CCGGGTTC 7 
~ UMrt -- i 



R A £ L M t L I A 7 N L L G G N 7 P A [ 



V N E A i Y G £ n W A Q 0 



CGCCGCCGGGAfGTT7GGC7ACGCCGCGGCGACGGCGACGGCG ACGGCGACG77GC7GCCG7T CGAGGAGGCGCCGGAGA7GACCAGCGCGGG7GGGC7C 

j i : i . j , i • , : _ , . u 

GCGGCGGCGC7ACAAACCGATGCGGCGCCGC7GCCGC7GCCGC7GCCGC7GCAACGACGGCAAGC7CCrcCGCGGCC7C7ACTGGTCGCGCCCACCCGAG 

i i i TbH9 '■"■'"'■■■■-■■■■■"■^^ 

AAAMFGYAAA7A7A7A7LLPP£r A? r MT3AGGL 

C7CGAGCAGGCCGCCGCGG7CGAGGAGGCC7CCGACACCGCCGCGGCGAACCAG7TGA7GAACAATG7GCCCCAGGCGC7GCAA ; *AGCTGGCCCA r, CCCA 
: 1 . s : , ; : , : 1 : J! u \3GQ 

GAGC7CG7CCGGCGGCGCCAGC7CC7CCGGAGGC7G7GGCGGCGCCGC7TGG7CAACTAC7TG77ACACGGGG7CCGCGACG77G7CGACCGGGTCGGG7 

yU£GAAAV££A507AAAMQLrtNNVPQALGGLAaP 

CP CGCAGGGCACCACGCCTTCTTCCAAGC7GGG7GGCC7G7GGAAGACGG7C7CGCCGCATCGG7CGCCGA7CAGCAACA7GG7G"CGA7GGCCAACAACCA 

Ofl ' ! ! i 1 i ! ' ' - ' i i * u iaco 

^ GCG7CCCG7GGTGCGGAAGAAGG77CGACCCACCGGACACC77C7GCCAGAGCGGCGTAGCCAGCGGCTAG7CG77G7ACCACAGCTACCGG77Gr7GG7_ 
i TbH9 — 1 

MTQGTTPSSKLGGLWKTV3PHas?rSNMV3l1A-NNH 

m 

CATG7CGATGACCAAC7CGGG7G7G7CGA7GACCAACACC77GAGC7CGA7G7TGAAGGGCr77GCTCCGGCGGCGGCCGCCCAGGCCG7GCAArACCGCG 
7_ ' ' 1 ! ' 5 1 ; * ' i ' i i i f ' r 1500 

G7ACAGC7ACTGGTTGAGCCCACACAGC7AC7GG77G7GGAACTCGAGC7ACAACT7CCCGAAACGAGGCCGCCGCCGGCGGG7CCGGCACG777GGCGC 

c ~ ^ 

^ MSMTNSGV5firNTLSSl1LKGFAPAAAAQAVQTA 

C 

O GCGCAAAACGGGG 7CCGGGCGA 7GAGC7CGCTGGGCAGC7CGC7GGG77C77CGGG7C 7GGGCGGTGGGG7GGCCGCCAAC77GGG7CGGGCGGCCTCGG 

- I i i i f . ! i i r 1 , , , 1 , k 1600 

CGCG7TT7GCCCCAGGCCCGC7ACTCGAGCGACCCG7CGAGCGACCCAAGAAGCCCAGACCCGCCACCCCACCGGCGG77GAACCCAGCCCGCCGGAGCC 

TbH9 - 

AGNGVRAMSSLGSSLGSSGLGGGVAANLGRAAS 

TCGGTTCGTTGTCGGTGCCGCAGGCCTGGGCCGCGGCCAACCAGGCAGTCACCCCGGCGGCGCGGGCGCTGCCGCTGACCAGCCTGACCAGCGCCGCGGA 
1 f • ' ' i ' ! 1 ^ * 1 i . s i f- 1700 

AGCCAAGCAACAGCCACGGCGTCCGGACCCGGCGCCGGTTGGTCCGTCAGrGGGGCCGCCGCGCCCGCGACGGCGACTGGTCGGACTGGTCG'CGGCGCCT 

VGSLSVPGAWAAANQAVTPAARALPl.TSLTSAA£ 
AAGAGGGCCCGGGCAGATGCTGGGCGGGC7GCCG(?7GGGGCAGATGGGCGCCAGGGCCGG7GGTGGGCTCAG7GG7G7GC7GCG7G7TCCGCCGCGACCC 

i i f f f 1 ' i f i . ■ 1 (_ j , j , L. igoo 

TTCTCCCGGGCCCGTCTACGACCCGCCCGACGGCCACCCCGTCTACCCGCGGTCCCGGCCACCACCCGAG7CACCACACGACGCACAAGGCGGCGCTGGG 

R-GPGQMLGGLPVGGMGARAGGGLSGVLRVPPRP 

TATG TGA TGCCGCA 7TCTCCGGCAGCCGGCGA 7A 7CA7GAGCAGAGCGTTC A TCATCGATCCAACGArCAG7GCCATTGACGGCTTGTACGACCTTCrGG 
1 1 i i 1 1 1 i f 1 i , 1 , ( , j , h 19Q0 

ATACACTACGGCGTAAGAGGCCGTCGGCCGCTATAG7AC7CG7CTCGCAAGTAGTAGCTAGGTTGCTAG7CACGG7AACTGCCGAACA7GCTGGAAGACC 



'TpH9 1 1 RV j hTC& (M29)i 



YVMPHSPAAGQ [ M S R A F I IOPT[3A[OGLYDLL 

FTC IO 



Monday, July 25, 19SS 3:2S PM p ^ 
, tfia^3S2VH^ht(lM2S).rncd (1 > 2232) Sfta r J Sacjerca f~ 1 " ga 

GGA 7 7GGAA 7AC C C AACCAAGG GGG7A7C C 777 -C7CC 7 CAC 7AGAG 7AC 77CGA '^AAAGCCC ~GGAGGAGC7GGC *GCAGCG777CCGGG7GA 7GGC 7G 

■ : ' ' ~ ~ '' ' ' ' — = — ' 3; ; ■ - OCCO 

CC7*ACCT7AtGGG77GG 77CCCCCATAGGAAA7GAGGAG7QA7C7CATGAAGC ~£ GGGACC 7CC 7CGiCCG7CG7CGC A AAGGCCCAC7ACCG AC ~ 
Mmv ii .i i ■ i nr - ■■■ i mi . ■■n hTCCl ti^oct\^ r — - ■ ■ , , , 

G I G tPNGGG IL'fSSLSYFcKALSSLAAAFPGOGW 
GuAGGTTCGGCCGCGGACAAATACGCCGGCAAAAACCGCAACCACG^ 

. : ? ; * h 5 h _ — : i — - Hil 9 100 

CAA TCCA A-GC CGGCGC C 7G 7 7TA 7GCGGCCG 77777GGCG773G 7GCACT 7A AAA AAGG7CC 7 7GACCG7C7GGAGC 7AGC AG 7CGAG 7AG7CGGAC7AG ~" 

» ■ t m i i ■ 1 ■ l ivrcci M L 

UGSAAOKYAGKNSNHYNrraSLAOuORQL 1 S L t 

CACGACCAGGCCAACGCGG7CCAGACGACCCGCGACA7CC7GGAGGGCGCCAAGAAAGG7CTCGAG77CG7GCGCCCGG7GGC7G7GGACCTGACC7ACA 
! ' "* — ' : 1 H 1 ' r ; ■ i— ' — 

GTGCrGGTCCGGTTGCGCCAGG7C7GC7GGGCGC7G7AGGACC7CCCGCGG77CTT7CCAGAGC7CAAGCACGCGGGCCACCGACACCTGGAC7GGA7G7 

mm i i Nn hTcci r ; r~; 

' H 0 G A N A V Q T T R 0 t L £ G A X X G L £ F V R ? y a V 0 L 7 Y 
TCCCGG7CG7CGGGCACGCCC7A7AAGATA7C 
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AGGGCCAGCAGCCCGTGCGGGArATTCTATAG 
(PVVGHAL.Of 



2232 



Fig,* : i o 

sSt^ct. 3 3. 



Monday, July 2S r 19992:^^ - ^ / - v , , , ^ 

ht(i-US>H9^t(151-3S2}.mcd (1 > 2365) - e:b 3 and Secuanca Hag*, 
Enzymes : 3 or 5 1 5 snzymes (Fiiterac) ... • < 

Satincs : Circular. Cenain Sites Cniv, Stan dard Genetic Coda 

CA7A7GCA7CACCA7CACCA7CACA7GrAGC'AGAGCG77CA7CA ^ £-^~GCAACGA7CAG7GCCAj7G A CGGC77G7ACGACC77C7G r nG * 

I , ; , ; : : ; ; , ; Z ■ ? ; ' " * ~ f 

GTA TACGTAG 7GGf AG 7GG7AGIG7AC 7CG 737 CGCAAG7AG TAG C7AGGT 7GC7AG ~CACGG~AACTGCCGAACA7GCTGGAAGACCCC T 

}— — Met /HIS TAG : 1 31 ■ ■ ..nTCd [1 i '")-■ 

H fl K H H H H H M 5 R A F ( E DP f ISA iOGLYOLUG 

rrGGA A TACCC A ACC A AGGGGGT A fCC "TT ACTCC "CACTAGAG T AC TTCGAA A AAGCCCTGGAGGAGCTGGCAGCAGfGrTTCCGGGrGA 

, ; : 1 : . : j " ^ ' ' - ! „„ 

AACCTTArGGGrTGGrrCCCCCArAGGAAATGAGGAGTGArCT"CArGAAGCTTTrrCGGGACCTCC7CGACCGTCGrrGCAAAGGC r CACr 
hTCC1 " 

(GEPNQGGILYSSL£YFEKAL££l AAAFPG0 

TGGC TGGTTAGG T TCGGC CGCGG AC AAA T ACGCCGGCA AAA AC CGCAACCACGTG A A rTTTTTCCAGGAACTnGCAGACCTCGATC^TC AG 

' '■• ' f ! ! > "i ! ► I ■ 273 

' . ACCGACCAArCCAAGCCGGCGCCTGrTTArGCGGCCGrrTTTGGCGTTGGrGCACrTAAAAAAGGrcCTTGACCGrCTGGAGCTAGCAGTC 
■ hTCC1 (1-1 49)—— — 

GWLGSAAOKYAGKNRMHVNFFQELAOLQRQ 

CTCATCAGCCTGATCCACGACCAGGCCAACGCGGrCCAGACGACCCGCGACATCCTGGAGGGCGCCAAGAAAGGrCTCGAGTTCGTGCGCC 

- < ' f ' i ' i ' 1 i . - , . , , 36 a 

g GAGrAGTCGGACTAGGTGCTGGTCCGGTTGCGCCAGGTCrGCTGGGCGCTGrAGGACCrcCCGCGGrrcrTTCCAGAGCTCAAGCACGCGG 
yj ■■■■■■■■■ hTCCl r A A ~y~ ' 

J L I 3L i HDGANAVGTTRQ [ LEGAKKGLEFVR 

Bfl CGGTGGCTGTGGACCTGACCTACArCCCGGTCGTCGGGCACGCCCTATCGGCCGCCT7CCAGGCGC^GTTTTGCGCGGGCGCGATPGCc"GT 

gi h 1 * i < I S 1 i ' i s i : i » 1 u a 55 

\| GCC ACCGACACC TGGAC TGG A rG7AGGGCCAGCAGCCCGTGCGGGATAGCCGGCGGAAGGTCCGCGGCAAAACGCGCCCGCGCTACCGGC A 

; PVAVOLTYIPVVGHAUSAAFQAPFCAGAMAV 

£3 AGTGGGCGGCGCGC7TAAGC77ATGG7GGA7T7CGGGGCGTTACCACCGGAGA'7CAAC7CCGCGAGGATGTACGCCGGCCCGGG7TCGGCC 

f * ; ; ; ' * ! 1 ' * : i < i * } ^ 546 

TCACCCGCCGCGCGAArTCGAATACCACCrAAAGCCCCGCAArGGTGGCCTCTAGTTGAGGCGCTCCTACATGCGGCCGGGCCCAAGCCGG 

5 —hTCCl (1-149)H fHind3 | ■ ■ TbK9 *"" 

O VGGALKLMVQFGALPPE [ N3ARMYAGPGSA 

TCGC7GG7GGCCGCGGCTCAGArGTGGGACAGCG7GGCGAGrGACCTG7TTTCGGCCGCG7CGGCGT7TCAG7CGGTGG7C7GGGGTCTGA 
1 . i . 1 . i * i i 1 , E , 1 ^ 1 , Q37 

AGCGACCACCGGCGCCGAGTC7ACACCCTG7CGCACCGC7CACTGGACAAAAGCCGGCGCAGCCGCAAAGTCAGCCACCAGACCCCAGACT 

SLVAAAQMW05VA30LFSAA3AFQ5VVWGL 

CGGTGGGGTCGTGG A TAGGT7CG7CGGCGGG7C7GA7GGrGGCGGCGGTC7CGCCGT A 73 7GGCGTGGA7GAGCG7CACCGCGGGGCAGGC 
! 1 ! f i < rH * i « I i f . j , [ 1 

GCCACCCCAGCACCTATCCAAGCAGCCGCCCAGACTACCACCGCCGCCAGAGCGGCATACACCGCACCTACfCGCAGTGGCGCCCCG7CCG 

i ■ TbH9 ■ 

TVGSW { G3SAGLMVAAVSPYVAWMSVTAGQA 

CGAGCTGACCGCCGCCCAGGrCCGGG77GC7GCGGCGGCCTACGAGACGGCGTA7GGGC7GACGGTGCCCCCGCCGGTGA7CGCCGAGAAC 
— I 1 1 f ! 1 i i i ( f 1 l~ ; [ 1 i f— 

GCTCGACTGGCGGCGGGTCCAGGCCCAACGACGCCGCCGGATGCTCrGCCGCATACCCGACTGCCACGGGGGCGGCCACTAGCGGCTCTTG 

TbH9 1 

ELTAAGVRVAAAAYETA YG LTVPPPV I A E N 



728 



Monday, July 2S, 1999 2A2 ?M p . 

nt(1-U9)-H9-ht{131-392).mod ft > 2365) and Sequence f ^ 



CGTGC TGAAC TG A TG A TTC TQA TAGC GACCAACC 'C'fGGGGCA A A AC ACCCCGGCGA'CGCGGTCiAClAGGCCG A 'A TACGG^GAG A TG 7 

+~ : = ! i ' J : 1 ' 1 ~ ^ ~ i h 9i0 

GC A CG AC 7 7G AC 7 AC TA AG AC 7 A 7CGC 7GG 77GGAGA iCCCCG7 7 7 7G 7GGGGCCGC7AGCGCC AG 7 7GC7CCGGC1* 7 A 7GCCGC7C 7 AC A 
a n i ■ ■ ■ ■ ■ ■ ■ -pi TdHS 

R A E L M I L E A 7 N L L G Q N T ? A i A V n £ a E Y G E fl 

GGGCC CAAGACGCCGCCGCGA7G 77 7GGC "ACGGCGCGGCG^CGGCG AC GGCGACGGCGACG77GC7GCCG7TrQAGGAGGCGC rr: G AG A 7 
. ■ . 1 : : ' \ ' i \ -J . ; 1 10 Ol 

CCCGGG77CTGCGGCGGCGC7ACAAACCGArGCGGCGCCGC7GCCGC7GCCGC7GCCGC7GCAACGACGGCA4GC7CC7CCGCGGCCTCTA 



WAQOAAAMFGYAAATATATATLLPF££AP 



GACCAGCGCGGG7GGGC7CCTCGAGCAGGCCGCCGCGG7CGAGGAGGCC7CCGACACCGCCGCGGCGAACCAG77GA7GAACAA7G7GCCC 

■• i 1 i ' I i 1 : '• i i -j 1 j , 1 1Q g^ 

C7GGTCGCGCCCACCCGAGGAGC7CG7CCGGCGGCGCCAGC7CC!CCGGAGGCrG7GGCGGCGCCGC77GG7CAAC7AC7rG7TACACG-GG 

TSAGGLL£QAAAV££ASO jAAANaLMNNVP 

CAGGCGC 7GC A A CAGC 7GG CC CAGCCCACGCAGGGCACCACGCCTTC 7 7CC A AGC7GGGTGGCC7G7GGAAGACGG7C7CGCCGC A TCGG 7 

. f : i ! i i ' i H • 1 ! I 133 

GTCCGCGACGTTGTCGACCGGGTCGGGTGCGrCCCGTGGTGCGGAAGAAGGTTCGACCCACCGGACACCTTCTGCCAGAGCGGCGTAGCCA 
O TbHS — — — _ — _ ___ 

SQALQGLAGP7QG77PS5KLGGLWK7VSPHR 

m 

i3CGCCGA7CAGCAACA7GG7G7CGA7GGCCAACAACCACA7G7CGA7GACCAAC7CGGGTG7G7CGA7GACCAACACC77GAGC7CGA7G7T 

ffl— * 1 '< i ! i ' ' 1 i i : H 1 i i 1 \27 f 4 

31 GCGGC 7AGTCG 7 7GT ACC A CAGC 7 ACCGG77G77GG7G7ACAGC7AC7GG 7 TGAGCCC AC ACAGC7AC7GG7TG7GGA AC 7CGAGC 7 AC AA 
%£ . 1 TbH9 1 ■ ' 

iUsP [ SNMVSMANNKMSMTNSGVSMTNTLSSML 

■i 

GAAGGGCTTTGCTCCGGCGGCGGCCGCCCAGGCCGTGCAAACCGCGGCGCAAAACGGGGTCCGGGCGATGAGCrCGCTGGGCAGCTCGCTG 

f 1 h ? ' i « 1 * ^ i 4 1 " i 1 E 1 i - 1365 

Li CTTCCCG A AACGAGGCCGCCGCCGGCGGG7CCGGCACG7T7GGCGCCGCG7TTTGCCCCAGGCCCGC7AC7CGAGCGACCCG 7 CGAGCG AC 
q i i ' i ■ ■ TbH9 ] 1 

O KGFAPAAAAQAVQTAAQNGVRAMSSLGSSL 

Q 

GG77C77CGGG7C7GGGCGGTGGGG7GGCCGCCAAC7TGGG7CGGGCGGCC7CGGTCGGTTCG77GTCGG7GCCGCAGGCCTGGGCCGCGG 
1 , 1 1 { ; i 1 f i i f s i ; , j 1— 1455 

CCAAG A AGCCC AG ACCCGCC ACCC C ACCGGCGG77GA ACCC AGCC CGC C GGAGC C AGCC A AGCA AC AGCCACGGCGTCCGG AC CCGGCGCC 

■ i i TbH9 —o—-—— »----«»»-_-«_~_-«<-. 

GSSGLGGGVAANLGRAASVGSLSVPQAWAA 

CC AA CCAGGC AG TCACCCCGGCGGCGCGGGCGC7GCCGCTGACCAGCC7G A CCAGCGCCGCGGAA AG AGGGCCCGGGC AG A 7GC7GGGCGG 

( , j 1 j 1 ( i « s i 1 : 1 1 j . \$aj 

GGTTGGTCCGTCAGTGGGGCCGCCGCGCCCGCGACGGCGACTGGTCGGACTGGTCGCGGCGCCTTTCTCCCGGGCCCGTCTACG ACCCGCC 

i i ii » TbH9 "» ■ 

ANGA V TP A ARALPL.TSLT SAAERGPGQMLGG 

GCTGCCGG7GGGGCAGATGGGCGCCAGGGCCGGrGG7GGGCTCAGTGGTGTGCTGCG7GTTCCGCCGCGACCC7ATG7GATGCCGCATTCT 

— I 1 j , { 1 1 i ~f f 1 1 * , j 1 f j 1638 

CGACGGCCACCCCGTCTACCCGCGGTCCCGGCCACCACCCGAGTCACCACACGACGCACAAGGCGGCGCTGGGATACAC7ACGGCGTAAGA 
■ ii ■ i TtaH9 i i 

LPVGQMGARAGGGLSGVLRVPPRPYVhPHS 

Flc,. . i\ 



Monday, July 25, 1999 2'A2 Ph\ ~ 

W1-U9VH9-htf1S1'3S2Vmcd ft > 2365) S *- and Secuanca ^ 1 0 

f i - , , ,_. 

CCQGCAGC CGGC A AGC T TAC TC A AC TCCTCAAiTTGCTTGCC 'A A A TTGGCGQAGTTGQ 7CGC3GCCGCC ATTGCGGACATCA" 77 CGQ A ~G 

GGCCG7CGGCCG77CGAA7GAG7 7G AGG AG 77 7AACGAACGG7 7 7AACCGCC7C A ACCAGCGCCGGCGG 7AACGC?7G 7 AG 7AAAGCC 7Af" ' ^ 

« hTCCI (1o1-392} ; 




I A G E [ S 0 

7GGCGG AC A TC A 7C A AGGGC A 7CCTCGGAGAAG7G7GGGAG7TCA7C AC A A ACGCGCTC A A CGGCC TGAAAGAGC 77 7GGGAC A AGC 7C AC 

-t : I i i '■ : i ! : : : ; 1 , u | goQ 

ACCGCCrGrAGrAGTrCCCG7AGGAGCC7C77CACACCCrCAAGTAGrGrrrGCGCGAGrrGCCGGACTT7CTCGAAACCCTGrTCGAG7G 
■ ■ ■ ' ■' hTCCI ~ 

V A 0 I I K G I L G £ V W £ F [ 7 N A L N G* L K E L W 0 K L 7 

GGGG7GGG7GACCGGAC7G77C7C7CGAGGG7GG7CGAACC7GGAG7CC77C777GCGGGCG7CCCCGGC77GACCGGCGCGAC r AGCG^'" 
' ! ' '• i ' i 1 * ! i r— i 1-1 Z II 

CCCCACCCACTGGCC7GACAAGAGAGC7CCCACCAGCT7GGACC7CAGGAAGAAACGCCCGCAGGGGCCGAAC7GGCCGCGCTGG7CGGCG 

■ ■ ' 1 ■ hTCCI (161-3S2)-— — — mmmmmmmmmmmm 

GWVTGLFSRGWSNLESFrAGVPGLTGATSG 

T7GTCGCAAGTGAC7GGC77G77CGG7GCGGCCGG7C7G7CCGCATCG7CGGGC7 7GGC7CACGCGGA7AGCC7GGCGAGC7CAGCCAGCT 

i 1 i i ■ 1 \ 1 f f : 1 1 ■ ■ , , > , ; 20Q2 

AACAGCG77CAC7GACCGAACAAGCCACGCCGGCCAGACAGGCG7AGCAGCCCGAACCGAG7GCGCC7A7CGGACCGC7CGAG 7CGG7CGA 

p ■ ■ ■ r hTCCI '* o * nno ; 



191 



^LSQV 7GLFGAAGLSA33GLAHA05LASSA3 

7GC C CGC C C 7GG C C GGC A 7 7GGGGGC GGG 7C CGG7T77GGGGGC 7 7GCCG A GC C 7GGC7CAGG7CC A 7GCCGCC7C A AC TCGGCAGGCG £-7 

i f i ■ ; f I * i ■ \ i i i > ^ 2G93 

ACGGGCGGGACCGGCCG7AACCCCCGCCCAGGCCAAAACCCCCGAACGGC TCGGACCGAG7CCAGGTACGGCGGAGTTGAGCCGTCCGCGA 



' hTCCI (161 -392}* 



2134 



LPALAG I GGGSGFGGLPSLAGVHAASTRQAL 

ACGGCCCCGAGCTGATGGCCCGGfCGGCGCCGC7GCCGAGCAGG7CGGCGGGCAG7CGCAGC7GG7CTCCGCGCAGGG7TCCCAAGGTATG 
— \ \ j 1 f- 1 1 i 1 i 1 — 1 • ' ■ i * t t 

7GCCGGGGC7CGAC7ACCGGGCCAGCCGCGGCGACGGC7CG7CCAGCCGCCCG7CAGCG7GGACCAGAGGCGCG7CCCAAGGG77CCA7AC 
i ■ ■ ■ i ■ 1 hTCCI (161-392)' ■ ■ ■■ i i 

RPRADGPVGAAAEQVGGQSCLVSAQGSQGM 

GGCGGACCCGTAGGCATGGGCGGCA7GCACCCC7CT7CGGGGGCG7CGAAAGGGACGACGACGAAGAAG7A.C7CGGAAGGCGCGGCGGCGG 

-f f i i ! 1 1 1 1 1 1 f [ , 1 , j f- 2275 

CCGCC7GGGCA7CCG7ACCCGCCG7ACG7GGGGAGAAGCCCCCGCAGC777CCC7GC7GC7GC77C77CA7GAGCCTTCCGCGCCGCCGCC 

11 ' hTCCI (161-392) " ' ■ r 

GGPVGMGGMHPSSGASKGTT7KKYSEGAAA 

GC AC 7G A AG ACGC CG AG C GCGCGC C AG 7CG AA GC 7G A CGCGGGC GG 7GGGC AA A AGG7GC 7 GG 7 ACGA A AC G7CG7CT A ACGGCGA A 7 TC 
1 , f 1 E i E t 1 1 i r~ > i i 1 *~ 2365 

CG TG AC TTC 7GCGGC TCGC GCGCGG TC AGC TJCG AC TGCGCCCGCCACCCGTTTTCCACG A CC A TGCTTTGC AGC AG A TTGCCGCT 7 A AG 



' hTCCI (161-392) f [ £ C oR[ 

G7£QA£RAPVEA0AGGGG;<VLVRNVV . RR I 



Fig, ; w 



Monday, July 25, 19952:^* " " ' ^ w ' ~ ' 'V~'~" " ' » ' — " ~ ^ , 

ht(13A-392)-HS-ht(1-2aO).mpd {1 >2*AS) Site " ' ^Sequence y ^ asa 

£nzyrn_s : 3 of 515 enzymes (Filtered) 

S^ngs: Linear, Certain Sit es Cniv, Standard Genetic Code 

CA7A fGCA TCACCA 7CACCA 7CACGA 7G7_"GCGGACATCA 7CAAGGGCA 7CC 7CGGAGAAG 7G7GGGAG7 7CA7CACAAACG'£GC7CAACGGCC 7GAAAG 

; . i ; i ■ | i _ _ , ; ^ ^ 

G7ATACGTAGrGGrAGrGGiAG!GCTACACCGCCTGrAGrAGr7CCCG7AGGAGCC7CrrCACACCCTCAAG7AG7G7 7 7GCGCGAG7TGCCGGAC77TC 
Met / H!S TAG m 1 — ■ hTCC 1 (1 34,392) - 

HMHHHHKHQVAQ[[KGfLG£Vy£r[TNALNGI !< 

AGC777GGGACAAGCTCACGGGG7GGG7GACCGGAC7G77C 7CTCGAGGG7GG7CGAACC7GGAG7CC7rC777QC , ~GGC'"7~ r CCG r '"77nACCGGC f:: C 
i ' ' 5 : '■■ 1 i 5 ^1^1 , % OQ0 

TCGAAACCC7Gr7CGAG7GCCCCACCCACrGGCCTGACAAGAGAGCrCCCACCAGC7rGGACC7CAGGAAGAAACGCCCGCAGGGGCCGAACTGGrCGCG 
; i ■ i ■■ ■i hTCCl MQjt ~' n ^ - 

SUWOKLTGWVrGLPSRGWSNLCSrFAGVPGLTGA 

GACCAGCGGCr7GTCGCAAG7GAC7GGCTTG77CGG7GCGGCCGG7C7GrCCGCA7CGrCGGGC77GGCrCACGCGGA7AGCCTGGCGAGC7CAGCCAGC 

CFGGTCGCCGA AC AG CG7TC AC 7GACCGA AC A AGCCACGCCGGCCAG AC AGGCG TAG CAGCCCGAACCGAG7GCGCCTArCGGACCGC7CGAG7CGGrC43 
- j ii hTCCl (134-392) -■■ 

TSGLSQVTGLFGAAGLSASSGLAHAQSLASSAS 

TTGCCCGCCC7GGCCGGCA7TGGGGGCGGG7CCGG7TT7GGGGGC7TGCCGAGCC7GGC7CAGGTCCA7GCCGCC7CAACrCGGCAGGCGCTACGGC r CC 
i ■ 1 . ■ i i , i , , , s : <iQQ 

AACGGGCGGGACCGGCCGTAACCCCCGCCCAGGCCAAAACCCCCGAACGGC7CGGACCGAG7CCAGG7ACGGCGGAG7TGAGCCGTCCGCGATGCCGGGG 
i hTCCl (1 34-392) "' 

^? L P A L A G 1 GGGSGFGGLPSLAQVHAA3TRQALR? 

F 1 GAGC7GATGGCCCGG7CGGCGCCGC7GCCGAGCAGG7CGGCGGGCAG7CGCAGC7GGTC7CCGCGCAGGG7TCCCAAGGTA7GGGCGGACCCG7AGGCA7 

ffl ! f 1 i ' i : i ' i ' : s - ■ -qq 

gg C7CGACTACCGGGCCAGCCGCGGCGACGGC7CG7CCAGCCGCCCG7CAGCG7CGACCAGAGGCGCGTCCCAAGGG7TCCA7ACCCGCC7GGGCA'CCGfr "* 
ffl j hTCCl ™™ 

S.jRA0GPVGAAASQVGGaSQLVSAGGSQG"M"G"GPV-GI1 
GGGCGGCATGCACCCC7C7TCGGGGGCGrCGAAAGGGACGACGACGAAGAAGTACTCGGAAGGCGCGGCGGCGGGCAC7GAAGACGCCGAGCGCGCGC r A 

*. ' 1 ' i ' ^ * ' 5 ' = 1 ' i « If 300 

CCCGCCGTACG7GGGGAGAAGCCCCCGCAGC77TCCCrGCTGCrGCT7C77CArGAGCCTTCCGCGCCGCCGCCCG7GAC77CTGCGGC7CGCGCGCGGT 

H GGMH PSSGASKGTfTKKYScGAAAGTSOAERA 0 

Q 

G G 7CGAAGCTG A CGCGGGCGGTGGGC A A AAGG7GC 7GG 7 ACG A A ACG 7CG7CGAA7TCA7GGTGGA777CGGGGCGT7ACCACCGGAGA7CAACTCCGCGA 

P , . i i i i ■ i ■ f H 1 j ! « , f- 7GO 

8885 CAGCTTCGACTGCGCCCGCCACCCGTrrTCCACGACCATGCTTTGCAGCAGCTTAAGTACCACCTAAAGCCCCGCAArGGTGGCCTCTAGTTGAGGCGCT 
■ i hTCCl (134-392) ■ I [EcoR! | i ■ iTbH9 

VEADAGGGaKVLVRNVVEFMVQrGALPPElNSA 
GGATGTACGCCGGCCCGGGTTCGGCCTCGCTGGTGGCCGCGGCTCAGATGTGGGACAGCGTGGCGAG7GACCTGTTTTCGGCCGCGTCGGCGTTTCAGTC 

1 ! I i 1 i ■ i « i 1 : . f 1 : ■ 1 , > r 8QQ 

CCTAC A TGCGGCCGGGCCC AAGCCGG AGCGACCACCGGCGCCGAGTCTACACCC 7G 7CGC ACCGC 7C AC 7GGACAAAAGCCGGCGCAGCCGCA A AG 7C AG 

Tvun - 



R M Y A 



GP GSASLVAAAGMWOS V ASDLFSAASAFQS 

GGTGGTCTGGGGTCTGACGGTGGGGTCGTGGATAGGTTCGTCGGCGGGTCTGArGGTGGCGGCGGCCTCGCCGrArGTGGCGrGGATGAGCGTCACCGCG 
1 1 f i 1 : i i 1 i : . 1 1 i 1 k 1 'r 900 

CCACCAGACCCCAGACTGCCACCCCAGCACCTArCCAAGCAGCCGCCCAGACTACCACCGCCGCCGGAGCGGCATACACCGCACCTACTCGCAGTGGCGC 



■TbHSi 



VWGLTVGSWiGSSAGLMVAAASPYVAWMSVTA 



Fig,- li- 



icncay, July 25, 2:^3 ?M 

;(7a^39aVH9-ht(1-aGQl.rnpd (1>2M 5 ) Sits a^ ^ aguence _ 

GGGCAGGCCGAGC7G A CCGCCGCCC AGG TCCGGGrTGC'C-CGGCGGCCTACG AG ACGGCG 7 A 7GGGC 7GCCCCCGC CGQTG A TCQCCG AG A AC r 

CCCG7CCGGCTCGACrGGCGGCGGG7CCAGGCCCAACGACSCCGCCGGArGC^^ lCCC 
Tn . ir ■ ,,,/r .ii ■„ , m . ,, . ,,■ ■ „ ■ i i r i TbH9 i i , " ' " ^ 

GQA£UTAAGVRVAAA4yeTAYGLTv??p V [AFN 

G7GC7GAAC7GATGA77C7GA7AGCGACCAACC7C77GGGGCAAAACACCCCGGCGA7CGCGG7CAACGAGGGCGAA7ACGGCGAGA7G7GGGCCCAAGA 
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RAeLMtLiArNLLGQNTPAEAVNiASYGSMWAQO 

CGCCGCCGCGArGTTTGGCrACGCCGCGGCGACGGCGACGGCGACGGCGACGTTGCrGCCGrrCGAGGAGGCGCCGGAGATGACCAGCGCGGGrGGGCrc 
GCGGCGGCGCTACAAACCGA7GCGGCGCCGC7GCCGCrGCCGC7GCCGC7GCAACGACGGCAAGC7CC7CCGCGGCC7C7AC7GG7CGCGCCCACCCGAG 
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^ L£QAAAV£eA3DTAAANau«NNV?QALQaUAGP 

?BCGCAGGGCACCACGCCr7C77CCAAGC7GGGTGGCC7G7GGAAGACGG7C7CGCCGCA7CGG7CGCCGA7CAGCAACA7GG7GrCGA-G^CCAACAAC-A 
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IgGCGTCCCGTGGTGCGGAAGAAGGTTCGACCCACCGGACACCTTCTGCCAGAGCGGCGTAGCCAGCGGCrAGrCGTTGrACCACAGCTACCGGrTGrTGGT 

m t&h& > 
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Q GCGCAAAACGGGGTCCGGGCGATGAGCTCGCrGGGCAGCTCGCrGGGTTCrrCGGGrCTGGGCGGTGGGGTGGCCGCCAACTTGGGTCGGGCGGCCTCGG 
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^ CSCGTTTTGCCCCAGGCCCGCTACTCGAGCGACCCGTCGAGCGACCCAAGAAGCCCAGACCCGCCACCCCACCGGCGGTTGAACCCAGCCCGCCGGAGCC 

AGNGVRAHSSLGSSLGSSGLGGGVAANLGRAAS 

TCGGTTCGTTGTCGGTGCCGCAGGCCTGGGCCGCGGCCAACCAGGCAGrCACCCCGGCGGCGCGGGCGCTGCCGCTGACCAGCCTGACCAGCGCCGCGGA 
' i ' i > 1 1 1 « i ' i 1 ■ -i ! 1 , s. l7Q0 

AGCCAAGCAACAGCCACGGCGTCCGGACCCGGCGCCGGTTGGTCCGTCAGTGGGGCCGCCGCGCCCGCGACGGCGACTGGTCGGACrGGTCGCGGCGCCT 

VGSLSVPaAWAAANQAVTPAARALPLTSLTSAAS 

AAGAGGGCCCGGGCAGA7GC7GGGCGGGC7GCCGG7GGGGCAGA7GGGCGCCAGGGCCGG7GG7GGGC7CAG7GG7G7GC7GCG7G77CCGCCGCGACCC 
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TTCTCCCGGGCCCGTCTACGACCCGCCCGACGGCCACCCCGTCTACCCGCGGTCCCGGCCACCACCCGAGTCACCACACGACGCACAAG^CGGCGCrGGG 
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A7ACACTACGGCGTAAGAGGCCG7CGGCCGC7A7AG7AC7CG7C7CGCAAG7AG7AGC7AGG77GC7AGTCACGG7AACTGCCGAACA7GC7GGAAGACC 
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CCTAACCT7A7GGGTTGG77CCCCCA7AGGAA A7GAGGAG7GA 7C7CA7GAAGC77TTTCGGGaCC7CC7CGACCG7CG7CGCAA'AGGCCCAC7ACCGAC 
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LGSAAOKYAGKNaNHVNFrQSLAOLORQLISLt 
CACGACCAGGCCAACGCGGTCCAGACGACCCGCGACArCCrGGAGGGCGCCAAGAAAGGrCTCGAGTTCGTGCGCCCGGTGGCTGTGGACCTGACCTACA 
GTGCTGG TCCGGTTGCGC CAGGTC 7GC7GGGCGC7GTAGGACCTCCCGCGG77C77TCCAGAGC7CAAGCACGCGGGCCACCGACACC7GGACTGGA7G7 

■HOGANAVGTTSQ IL£GA<KGLcrVRPVAV0L7Y 
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-377GCGAC7AG7TGCGC7GAG77GAGGAG777AACGAACGG77TAACCGCC7CAACCAGCGCCGGCGGTAACGCCTG7AG7AAAGCC7ACACCGCC7G7AG 
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TAG77CCCG7AGGAGCC7CTTCACACCCTCAAGTAGAT7CTA7AG 
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Figure : Nucleoside sequence of MTb59 

cacgaccgcccgaGcgaacccgaaccagccagcaGaaaccgaagtiaggaagacgaaaagctatagc 

tgagccgacaatccccgctgatgacatccagagcgcaatcgaagagtacgtaagctcttccaccgc 

cgacaccagtagagaggaagtcgguaccgccgtcgatgccggggacggcaccgcacacgccgaggg 

ttztgccaccggCgatgacccaagagctigctcgaatccccgggcggaaccctcggcgtcgccGccaa 

cctcgacgagcacagcgncggcgcggcgatcctcggcgacctcgagaacatcgaagaaggtcagca 

ggtGaagcgcaccggcgaagticctaccggc-ccggccggcgacgggtuttitggggcgggtggttaa 

cccgctcggccagccgatzcgacgggcgcggagacgtGgactrccgatacticggcgcgcgcirggagct 

ccaggcgccctcggtggtgcaccggcaaggcgtgaaggagccgttgcagaccgggatcaaggcgat: 

tgacgcgatgaccccgaticggccgcggccagcgccagctgatcatcggcgaccgGaagaGcggcaa 

aaccgccgcctgcgtcgacaccatccccaaccagcggcagaaGtgggagtiGcggtgatcccaagaa 

gcaggcgcgctgtgtatacgtggccatzcgggcagaagggaactaccatcgccgcggtacgccgcac 

accggaagagggcggtgcgatggactzacaccaccatGgtcgcggccgcggcgtcggagtccgccgg 

tttcaaatggctztigcgccgtacaccggcticggcgatGgcGcagGactggatgtacgagggcaagca 

tgtgccgaucatcttcgacgacccgactaagcaggccgaggcataccgggcgatctcgctgctgcu 

gcgccgtccgcccggccgcgaggcccaccccggcgatgtgttctatccgcattcgcggGttttgga 

gGgctgcgccaaactgtGcgacgatctcggtggcggctGgctaacgggtctgGcgatcatcgagaG 

caaggccaacgacatctcggcctacatcccgaccaacgtcacctcgaticaccgacgggcaatgttt 

cccggaaarccgacctgttcaaccagggcgticcggccggccatcaaGgtGggtgtgtcggtgtcccg 

agtcggcggcgcggcgcagatcaaggctatigaaagaggtcgccggaagcctccgcttzggaccttztc 

gcaataccgcgagctagaagcttccgccgctttcgcttctgatiitggacgccgcatcgaaggcgca 

gttggagcgcggcgGGcggccggtcgagctgGiicaagcagcGgcaatcccagcccatgcccgttga 

ggagcaagtggtttcgatcttcctigggcaccggcggtcacctggactcggcgcccgtcgaggacgt 

ccggcggcucgaaaccgaattactggaccacatgcgggcctccgaagaagagattttgactgagat 

ccgggacagccaaaagctcaccgaggaggccgccgaGaagGtcaccgaggtcatcaagaacttGaa 

gaagggcctcgcggccaccggtggcggctctigtiggtgcccgacgaacatgtcgaggccctcgacga 

ggataagoccgccaaggaagccgtgaaggtcaaaaagccggcgGcgaagaagaagaaatagctaac 

catggctgccacactticgcgaactacgcgggcggatccgctcggcagggtcgaticaaaaagatcac 

caaggGccaggagctgattgcgacatzcgcgcatcgccagggcgcaggctzcggcticgagtccgctcg 

gccctacgcttttgagatcacccggatgcttaccacoctggccgctgaagccgcactggaccatzcc 
gttgct 



Figure . : Amino acid sequence of M7b59 

MASLTIPADDIQSAIESYVSSFTADTSaSEVGTVVDAGDGIAHVEGL-PSVMTQSLLEFPGGILGVA 

LNLDEHSVGAVILGDFSNIEEGQQVKRTGEVLSVPVGDGFLGRVVNPLGQPIDGRGDVDSDTRRAL 

ELQAPSWHP.QGVKSPLQTGIKAIDAMTPIGRGQRQLIIGDRKTGKTAVCvTDTILNQRQNlvESGDP 
KKQVRCVYVAIC-QKGTTIAAVRRTLEEGGA*TO^ 

KHVLI I FDDLTKQAEAYRAI SLLLRRP5GSEAYPGDVFYLHSRLLSRCAXLSDDLGGGS LTGLP 1 1 
ETKAND 1 3AYI PTNVI S I TDGQCFL3TDLFNQGVRPAI NVGVS VSRVGGAAQI KAMKEVAGSLRLD 
LSQYRELEAFAAFASDU3AASKAQLERGARLVELLKQPQSQPMPVS2QWSIFLGTGGHLDSVPVE 

DVRRFETELLDKMRASEEEILTEIRDSQKLTEEAADKLTEVIKNFKKGFAATGGGSWPDEHVEAL 
DEDKLAJCEAVKVKKPAPKKKK 



Figure 



Nucleoside sequence of MTb3 2 



ccagcccccgccccgcccacgccgaggcacgtggactgatggccaaagcgtcagagaccgaacgtic 

cgggccccggcacccaaccggcggacgcccagaccgcgacgcccgcgacggcccgacccccgagca 

cccaggcggcgt uccgccccgatcccggcgacgaggacaactucccccatccgacgcucggcccgg 

acaccgagccgcaagaccggatggccaccaccagccgggtgcgcccgccggccagacggccgggcg 

gcggcctggtggaaaEcccgcgggcgcccgatatcgatccgctugaggccctgatgaccaacccgg 

tggugccggagtccaagcggccctgccggaactgtggacgtcccgucggccggticcgactcggaga 

ccaagggagcttcagagggccggtgcccctattigcggcagcccgtactcgcccctgccgcagctaa 

atcccggggacatcgucgccggccagcacgaggtcaaaggctigcatcgcgcacggcggactgggct 

ggatctacctcgcccccgaccgcaatgticaacggccgtccggtggtgcticaagggcctggtgcati: 

ccggtgatgccgaagcgcaggcaatggcgatggccgaacgccagttcctggccgaggtggtgcacc 

cgtcgatcgtgcagatcntcaactcugtcgagcacaccgacaggcacggggatccggccggctaca 

tcgtgatggaatacgtcggcgggcaaccgctcaaacgcagcaagggtcagaaactgcccgtcgcgg 

aggccatcgcctaccugctggagaccctgccggcgctgagccacctgcatcccatcggcutggtct: 

acaacgacctgaagccggaaaacaucatgctgaccgaggaacagctcaagctgatcgacctgggcg 

cggtatcgcggatcaactcgttcggcnacctctacgggaccccaggcttccaggcgcccgagatcg 

tgcggaccggtccgacggtggccaccgacatzctacaccgtgggacgcacgctcgcggcgcccacgc 

tggacctgcccacccgcaatggccgttatgtggatgggctacccgaagacgacccggtgctgaaaa 

cctacgactcttacggccggttgcugcgcagggccatcgaccccgatccgcggcaacggttcacca 

ccgccgaagagatgtccgcgcaatzcgacgggcgtgttgcgggaggtggtcgcccaggacaccgggg 

tgccgcggccagggctatcaacgatcutcagtcccagtcggticgacatttggagtggacctgctgg 

tggcgcacaccgacgtgtatctggacgggcaggtgcacgcggagaagctgaccgccaacgagatcg 

tgaccgcgccgtcggtgccgctggtcgacccgaccgacgtcgcagcttcggccctigcaggccacgg 

tgctctcccagccggtgcagaccctagactcgctgcgcgcggcccgccacggtgcgctggacgccg 

acggcgtcgacttctccgagtcagtggagctgccgctaatggaagtccgcgcgctgctggatctcg 

gcgatgtggccaaggccacccgaaaactcgacgatctggccgaacgcgttggctggcgatggcgat 

tggtctggtaccgggccgtcgccgagctgcticaccggcgactatgactcggccaccaaacatttca 

ccgaggtgctggatacctttcccggcgagctggcgcccaagctcgccctggccgccaccgccgaac 

tagccggcaacaccgacgaacacaagttctatcagacggtgtggagcaccaacgacggcgtgatct 

cggcggctttcggactggccagagcccggtcggccgaaggtgatcgggtcggcgccgtgcgcacgc 

tcgacgaggtaccgcccacttctcggcatttcaccacggcacggctgaccagcgcggtgactctgt 

tgtccggccggtcaacgagtgaagtcaccgaggaacagatccgcgacgccgcccgaagagtggagg 

cgctgcccccgaccgaaccacgcgugctgcagatccgcgccctggtgctgggtggcgcgctggact 

ggctgaaggacaacaaggccagcaccaaccacatccdcggtttcccgttcaccagtcacgggctgc 

ggctgggtgtcgaggcgtcactgcgcagcctggcccgggcagctcccactcaacggcatcgctaca 

cgctggtggacatggccaacaaggtccggcccaccagcacgttctiaagccgcccgagtgtgaatcg 
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Figure _ : Amino acid sequence of iMTb3 2 

MAKASETERSGPGTQPADAQTATSATVRPLSTQAV^ 
VRPPVRRLGGGLVEIPRAPDIDPLEALMTNPWPESKRFCWN^ 

S ? Y 5 FL P Q LNPGD I VAGQ YE VKGC I AHGG LG W I YLALD F^TVNGRP^/lLKGLVTiSGDAEAQ^AiMAS 
RQ F LAS WKP S I VQ I FNF VEHTD RHGD ? VGY I VM E YVGGQ S L KRSKGQKLPVAEAIAYLLEILPAL 
SYLHSIGLVYNDLKPENIMLTEEQLKLIDLGAVSRINSFGYLYGTPGFQAPEIVRTGPTVATDIY^ 
VGRTLAALTLDLPTRNGRYVDGL?EDD?VLKTYD5YGRLLRRAIDPD?RQR?TTAEEMSAQL 
REWAQDTGVPRPGLSTIFSPSRSTFGVDLLVAHTDVYLD^^ 

VAAS VLQ AT VL S Q P VQ TLD S LRAAJIHGALD ADGVD F S E S VE L ? LMSVRAL LD LGD VAKATRKLD D L 

AERVGWRWRLVWYRAVAELLTGDYDSATKKFra 

WSTNDGVISAAFGLARARSAEGDRVGA.VRTIJDEVPPTSEIHFTTA 

IRDAARRVISALPPTEPRVLQIRALV^ 

VAPTQRHRYTLVDKANKVRPTSTF . 
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Figure . : Amino Acid Sequence of secreted DPPD 

DPPDPKQPCiMTKGYGPGGRv/GFC^^ 
G?PP PGGCGGAI P3 EQ PNAP 
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SEQUENCE LISTING 



Mtb41 (MTCC#2) 

(2) INFORMATION FOR SEQ ID NO: 14 0: 

(i) SKQITZNCS CHARACTERISTICS : 

(A) LENGTH: 1441 bass pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 
£D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 140: 

GAGGTTGCTG GCAATGGATT TCGGGCTTTT ACCTCCGGAA GTGAATTCAA GCCGAATGTA 60 

TTCCGGTCCG GGGCCGGAGT CGATGCTAGC CGCCGCGGCC GCCTGGGACG GTGTGGCCGC 120 

GGAGTTGACT TCCGCCGCGG TCTCGTATGG ATCGGTGGTG TCGACGCTGA TCGTTGAGCC 130 

GTGGATGGGG CCGGCGGCGG CCGCGATGGC GGCCGCGGCA ACGCCGTATG TGGGGTGGCT 24 0 

GGCCGCCACG GCGGCGCTGG CGAAGGAGAC GGCCACACAG GCGAGGGCAG CGGCGGAAGC 300 

GTTTGGGACG GCGTTCGCGA TGACGGTGCC ACCATCCCTC GTCGCGGCCA ACCGCAGCCG 3 60 

GTTGATGTCG CTGGTCGCGG CGAACATTCT GGGGCAAAAC AGTGCGGCGA TCGCGGCTAC 42 0 

CCAGGCCGAG TATGC CGAAA TGTGGGCCCA AGACGCTGCC GTGATGTACA GCTATGAGGG 480 

GGCATCTGCG GCCGCGTCGG CGTTGCCGCC GTTCACTCCA CCCGTGCAAG GCACCGGCCC 54 0 

GGCCGGGCCC GCGGCCGCAG CCGCGGCGAC CCAAGCCGCC GGTGCGGGCG CCGTTGCGGA 600 

TGCACAGGCG ACACTGGCCC AGCTGCCCCC GGGGATCCTG AGCGACATTC TGTCCGCATT 660 

GGCCGCCAAC GCTGATCCGC TGACATCGGG ACTGTTGGGG ATCGCGTCGA CCCTCAACCC 72 0 

GCAAGTCGGA TCCGCTCAGC CGATAGTGAT CCCCACCCCG ATAGGGGAAT TGGACGTGAT 780 

CGCGCTCTAC ATTGCATCCA TCGCGACCGG CAGCATTGCG CTCGCGATCA CGAACACGGC 84 0 

CAGACCCTGG CACATCGGCC TATACGGGAA CGCCGGCGGG CTGGGACCGA CGCAGGGCCA 900 

TCCACTGAGT TCGGCGACCG ACGAGCCGGA GCCGCACTGG GGCCCCTTCG GGGGCGCGGC 960 

GCCGGTGTCC GCGGGCGTCG GCCACGCAGC ATTAGT CGGA GCGTTGTCGG TGCCGCACAG 102 0 

CTGGACCACG GCCGCCCCGG AGATCCAGCT CGCCGTTCAG GCAACACCCA CCTTCAGCTC 10 80 

CAGCGCCGGC GCCGACCCGA CGGCCCTAAA CGGGATGCCG GCAGGCCTGC TCAGCGGGAT 114 0 

GGCTTTGGCG AGCCTGGCCG CACGCGGCAC GACGGGCGGT GGCGGCACCC GTAGCGGCAC 1200 

CAGCACTGAC GGCCAAGAGG ACGGCCGCAA ACCCCCGGTA GTTGTGATTA GAGAGCAGCC 1260 

GCCGCCCGGA AACCCCCCGC GGTAAAAGTC CGGCAACCGT TCGTCGCCGC GCGGAAAATG 13 2 0 

CCTGGTGAGC GTGGCTATCC GACGGGCCGT TCACACCGCT TGTAGTAGCG TACGGCTATG 13 80 

GACGACGGTG TCTGGATTCT CGGCGGCTAT CAGAGCGATT TTGCTCGCAA CCTCAGCAAA 144 0 

G 1441 



(2) INFORMATION FOR SEQ ID NO: 142: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 423 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 142: 

Met Asp Phe Gly Leu Leu Pro Pro Glu Val Asn Ser Ser Arg Met Tyr 

1 5 10 15 

Ser Gly Pro Gly Pro Glu Ser Met Leu Ala Ala Ala Ala Ala Trp Asp 
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Ui UL 






Ser 
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Val 


Ser 
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t-xiy ber Val 
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Val 
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Thr- 
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Pro 


Trn 


Met 


Gly Pro Ala 


Ala Ala Ala 
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60 






1*1 C U 


a 7 a 




Ala 


Hid 


Th-r 

i xir 


Pro 


Tyr 


v CL J. 


Gly Trp 


Leu 


Ala 


Ala Thr Ala 
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70 










75 
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Leu 


rt._L a 


Lys 


vjIU 


Thr 


Ala 
Ma 


Tin v- 
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Ala 


Arg 


Ala 


Ala 


Ala Glu Ala 
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90 
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Al 3 
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V cli 
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Pro 


Ser 


Leu 
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110 


Ash 


Arg 


Ser 


Arg 


Leu 


Met 


Ser 


Leu 


Va 1 

v ai 


Ala 


Ala 


Asn 


He 


Leu Gly Gin 






115 










120 










125 




Ash 


Ser 


Ala 


Ala 


lie 


Ala 


Ala 


Thr 


Tin 

Gin 


Ala 


Glu 


Tyr 


Ala 


Glu Met Trp 




130 










135 










140 








Asp 


Ala 


Ala 


val 


Met 


Tyr 


bet 


Tyr 


Glu 


Gly Ala 


Ser Ala Ala 


145 










150 










155 






160 


Ala 


Ser 


Ala 


Leu 


Pro 


Pro 


T"l 

Pile 


Thr 


Pro 


Pro 


Val 


Gin 


Gly Thr Gly Pro 










165 










170 








175 


Ala 


Gly 


Pro 


Ala 


Ala 


Ala 


Ala 


Ala 


Ala 


Thr 


Gin 


Ala 


Ala 


Gly Ala Gly 








130 










135 










190 


Ala 


Val 


Ala 


Asp 


Ala 


Gin 


Ala 


Thr 


Leu 


Ala 


Gin 


Leu 


Pro 


Pro Gly He 






195 










200 










205 




Leu 


Ser 


Asp 


lie 


Leu 


Ser 


Ala 


Leu 


Ala 


Ala 


Asn 


Ala 


Asp 


Pro Leu Thr 




210 










215 










220 






Ser 


Gly 


Leu 


Leu 


Gly 


lie 


Ala 


Ser 


Thr 


Leu 


Asn 


Pro 


Gin 


Val Gly Ser 


225 










230 










235 






240 


Ala 


Gin 


Pro 


lie 


Val 


lie 


Pro 


Thr 


Pro 


He 


Gly Glu 


Leu 


Asp Val He 










245 










250 








255 


Ala 


Leu 


Tyr 


lie 


Ala 


Ser 


lie 


Ala 


Thr 


Gly 


Ser 


He 


Ala 


Leu Ala He 








260 










265 










270 


Thr 


Asn 


Thr 


Ala 


Arg 


Pro 


Trp 


His 


He 


Gly 


Leu 


Tyr Gly Asn Ala Gly 






275 










280 










235 




Gly 


Leu 


Gly 


Pro 


Thr Gin Gly His 


Pro 


Leu 


Ser 


Ser 


Ala 


Thr Asp Glu 




290 










295 










300 






Pro 


Glu 


Pro 


His 


Trp 


Gly Pro 


Phe 


Gly Gly Ala Ala 


Pro 


Val Ser Ala 


305 










310 










315 






320 


Gly 


Val 


Gly His 


Ala 


Ala 


Leu 


Val 


Gly Ala 


Leu 


Ser 


Val 


Pro His Ser 










325 










330 








335 


Trp 


Thr 


Thr 


Ala 


Ala 


Pro 


Glu 


He 


Gin 


Leu 


Ala 


Val 


Gin 


Ala Thr Pro 








340 










345 










350 


Thr 


Phe 


Ser 


Ser 


Ser Ala Gly Ala Asp 


Pro 


Thr 


Ala 


Leu 


Asn Gly Met 






355 










360 










365 




Pro 


Ala 


Gly 


Leu 


Leu 


Ser 


Gly Met 


Ala 


Leu 


Ala 


Ser 


Leu 


Ala Ala Arg 




370 










375 










380 






Gly 


Thr 


Thr Gly Gly Gly Gly Thr Arg Ser Gly Thr Ser 


Thr Asp Gly 


335 










390 










3 95 






400 


Gin 


Glu 


Asp 


Gly Arg Lys 


Pro 


Pro 


Val 


Val 


Val 


He 


Arg 


Glu Gin Pro 










405 










410 








415 


Pro 


Pro 


Gly Asn 


Pro 


Pro 


Arg 

















420 



Mtb40 (HTCC#1) 

(2) INFORMATION FOR SEQ ID NO: 13 7: 



2 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1200 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(XX) SEQUENCE DESCRIPTION: SEQ ID NO: 137: 

CAGGCATGAG CAGAGCGTTC ATCATCGATC CAACGATCAG TGCCATTGAC GGCTTGTACG 60 

ACCTT CTGGG GATTGGAATA CCCAACCAAG GGGGTATCCT TTACTCCTCA CTAGAGTACT 12 0 

TCGAAAAAGC CCTGGAGGAG CTGGCAGCAG CGTTTCCGGG TGATGGCTGG TTAGGTTCGG 18 0 

CCGCGGACAA ATACGCCGGC AAAAACCGCA AC CACGTGAA TTTTTTCCAG GAACTGGCAG 24 0 

ACCTCGATCG TCAGCTCATC AGCCTGATCC ACGACCAGGC CAACGCGGTC CAGACGACCC 3 00 

GCGACATCCT GGAGGGCGCC AAGAAAGGTC TCGAGTTCGT GCGCCCGGTG GCTGTGGACC 3 60 

TGACCTACAT CCCGGTCGTC GGGCACGCCC TATCGGCCGC CTTCCAGGCG CCGTTTTGCG 420 

CGGGCGCGAT GGC CGTAGTG GGCGGCGCGC TTGCCTACTT GGTCGTGAAA ACGCTGATCA 480 

ACGCGACTCA ACTCCTCAAA TTGCTTGCCA AATTGGCGGA GTTGGTCGCG GCCGCCATTG 540 

CGGACAT CAT TTCGGATGTG GCGGACATCA TCAAGGGCAC CCTCGGAGAA GTGTGGGAGT 60 0 

TCATCACAAA CGCGCTCAAC GGCCTGAAAG AGCTTTGGGA CAAGCTCACG GGGTGGGTGA 660 

CCGGACTGTT CTCTCGAGGG TGGTCGAACC TGGAGTCCTT CTTTGCGGGC GTCCCCGGCT 72 0 

TGACCGGCGC GACCAGCGGC TTGTCGCAAG TGACTGGCTT GTTCGGTGCG GCCGGTCTGT 780 

CCGCATCGTC GGGCTTGGCT CACGCGGATA GCCTGGCGAG CTCAGCCAGC TTGCCCGCCC 84 0 

TGGCCGGCAT TGGGGGCGGG TCCGGTTTTG GGGGCTTGCC GAGCCTGGCT CAGGTCCATG 900 

CCGCCTCAAC TCGGCAGGCG CTACGGCCCC GAGCTGATGG CCCGGTCGGC GCCGCTGCCG 960 

AGCAGGTCGG CGGGCAGTCG CAGCTGGTCT CCGCGCAGGG TTCCCAAGGT ATGGGCGGAC 102 0 

CCGTAGGCAT GGGCGGCATG CACCCCTCTT CGGGGGCGTC GAAAGGGACG ACGACGAAGA 1080 

AGTACTCGGA AGGCGCGGCG GCGGGCACTG AAGACGC CGA GCGCGCGCCA GTCGAAGCTG 114 0 

ACGCGGGCGG TGGGCAAAAG GTGCTGGTAC GAAACGTCGT CTAACGGCAT GGCGAGCCAA 120 0 



(2) INFORMATION FOR SEQ ID NO: 13 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 92 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 8: 
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Ser 


Arg 


Ala 


Phe 


He 


He 


Asp 
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He 


Ser 


Ala 


He 


Asp 


Gly 
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15 




Leu 


Tyr 


Asp 


Leu 
20 


Leu 


Gly 


lie 


Gly 


He 
25 


Pro 


Asn 


Gin 


Gly 


Gly 
30 


He 


Leu 


Tyr 


Ser 


Ser 
35 


Leu 


Glu 


Tyr 


Phe 


Glu 
40 


Lys 


Ala 


Leu 


Glu 


Glu 
45 


Leu 


Ala 


Ala 


Ala 


Phe 
50 


Pro 


Gly 


Asp 


Gly 


Trp 
55 


Leu 


Gly 


Ser 


Ala 


Ala 
60 


Asp 


Lys 


Tyr 


Ala 


Gly 


Lys 


Asn 


Arg 


Asn 


His 


Val 


Asn 


Phe 


Phe 


Gin 


Glu 


Leu 


Ala 


Asp" 


Leu 


65 










70 










75 










80 


Asp 


Arg 


Gin 


Leu 


He 
85 


Ser 


Leu 


He 


His 


Asp 
90 


Gin 


Ala 


Asn 


Ala 


Val 
95 


Gin 


Thr 


Thr 


Arg 


Asp 
100 


He 


Leu 


Glu 


Gly 


Ala 
105 


Lys 


Lys 


Gly 


Leu 


Glu 
110 


Phe 


Val 
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Val 


Ala 


Val 
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Leu 


Thr 


Tyr 


He 


Pro 


Val 


Val 


Gly His 


Aia 
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Ala 
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Ala 
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Gly 
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Thr 
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Ala 
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Asp 
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Mtb9.9A (MTI-A) 

(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1742 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mycobacterium tuberculosis 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

CCGCTCTCTT TCAACGTCAT AAGTTCGGTG GGCCAGTCGG CCGCGCGTGC ATATGGCACC 60 
AATAACGCGT GTCCCATGGA TACCCGGACC GCACGACGGT AGAGCGGATC AGCGCAGCCG 120 



4 



GTGCCGAACA CTACCGCGTC CACGCTCAGC CCTGCCGCGT TGCGGAAGAT CGAGCCCAGG 13 0 

TTCTCATGGT CGTTAACGCC TTCCAACACT GCGACGGTGC GCGCCCCGGG GACGACCTGA 2 40 

GCAACGCTCG GCTCCGGCAC CCGGCGCGGG GCTGC CAACA CCCGACGATT GAGATGGAAG 3 00 

CCGATCACCC GTGCCATGAC ATCAGCCGAC GCTCGATAGT ACGGCGCGGC GACAGCGGCC 3 SO 

AGATCATCCT TGAGCTCGGC CAGCCGGCGG TCGGTGCCGA ACAGCGCCAG CGGCGTGAAC 42 0 

CGTGAGGCCA GCATGCGCTG CACCACCAGC ACACCCTCGG GGATCACCAA CGCCTTGCCG 480 

GTCGGCAGAT CGGGACNACN GTCGATGCTG TTCAGGTCAC GGAAATCGTC GAGGCGTGGG 54 0 

TCGTCGGGAT CGCAGACGTC CTGAACATCG AGGCCGTCGG GGTGCTGGGG ACAAGGGCCT 600 

TCGGTCACGG GCTTTCGTCG ACCAGAGCCA GCATCAGATC GGCGGCGCTG CGCAGGATGT 560 

CACGCTCGCT GCGGTTCAGC GTCGCGAGCC GCTCAGCCAG CCACTCTTGC AGAGAGCCGT 720 

TGCTGGGATT AATTGGGAGA GGAAGACAGC ATGTCGTTGG TGAC CACAC A GCCGGAAGCG 78 0 

CTGGCAGCTG CGGCGGCGAA CCTACAGGGT ATTGGCACGA CAATGAACGC CCAGAACGCG 34 0 

GCCGCGGCTG CTCCAACCAC CGGAGTAGTG CCCGCAGCCG CCGATGAAGT ATCAGCGCTG 90 0 

ACCGCGGGTC AGTTTGCTGC GCACGCGCAG ATGTACCAAA CGGTCAGCGC CCAGGCCGCG 95 0 

GC CATTCACG AAATGTTCGT GAACACGCTG GTGGCCAGTT CTGGCTCATA CGCGGCCACC 102 0 

GAGGCGGCCA ACGCAGCCGC TGCCGGCTGA ACGGGCTCGC AGGAACCTGC TGAAGGAGAG 10 80 

GGGGAACATC CGGAGTTCTC GGGTCAGGGG TTGCGCCAGC GCCCAGCCGA TTCAGNTATC 114 0 

GGCGTCCATA ACAGCAGACG ATCTAGGCAT TCAGTACTAA GGAGACAGGC AAGATGGCCT 12 00 

CACGTTTTAT GACGGATCCG CATGCGATGC GGGACATGGC GGGCCGTTTT GAGGTGCACG 1260 

CCCAGACGGT GGAGGACGAG GGTCGGCGGA TGTGGGCGTC CGCGGAAAAC ATTTCCGGTG 132 0 

CGGGCTGGAG TGGCATGGCC GAGGCGACCT CGCTAGACAC CATGACCTAG ATGAATCAGG 13 30 

CGTTTCGCAA CATCGTGAAC ATGCTGCAGG GGGTGGGTGA CGGGCTGGTT CGCGACGCCA 144 0 

ACAANTACGA ACAGCAAGAG GAGGCCTCCG AGCAGATCCT GAGCAGNTAG CGCCGAAAGC 150 0 

CACAGCTGNG TACGNTTTCT CAGATTAGGA GAACAC CAAT ATGACGATTA ATTACCAGTT 1560 

CGGGGACGTC GACGCTCATG GCGCCATGAT CCGCGGTCAG GCGGCGTCGC TTGAGGCGGA 162 0 

GCATCAGGCG ATCGTTCGTG ATGTGTTGGC CGGGGGTGAC TTTTGGGGCG GCGCGGGTTC 16 30 

GGTGGCTTGC CAGGAGTTGA TTACCCAGTT GGGCCGTAAC TTC CAGGTGA TCTACGAGCA 1740 

GG 1742 



(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 233 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mycobacterium tuberculosis 

Cxi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

GTTGATTCCG TTCGCGGCGC CGCCGAAGAC CACCAACTCC GCTGGGGTGG TCGCACAGGC 60 

GGTTGCGTCG GTCAGCTGGC CGAATCCCAA TGATTGGTGG CTCNGTGCGG TTGCTGGGCT 120 

CGATTACCCC CACGGAAAGG ACGACGATCG TTCGTTTGCT CGGTCAGTCG TACTTGGCGA 180 

CGGGCATGGC GCGGTTTCTT ACCTCGATCG CACAGCAGCT GACCTTCGGG CCAGGGGGCA 240 

CAACGGCTGG CTCCGGCGGA GCCTGGTACC CAACGCCACA ATTCGCCGGC CTGGGTGCAG 300 

GCCCGGCGGT GTCGGCGAGT TTGGCGCGGG CGGAGCCGGT CGGGAGGTTG TCGGTGCCGC 3 60 

CAAGTTGGGC CGTCGCGGCT CCGGCCTTCG CGGAGAAGCC TGAGGGGGGC ACGCCGATGT 420 

CCGTCATCGG CGAAGCGTCC AGCTGCGGTC AGGGAGGCCT GCTTCGAGGC ATACCGCTGG 4 80 

CGAGAGCGGG GCGGCGTACA GGCGC CTTCG CTC AC CGATA CGGGTTCCGC CACAGCGTGA 540 

TTACCCGGTC TCCGTCGGCG GGATAGCTTT CGATCCGGTC TGCGCGGCCG CCGGAAATGC 600 

TGCAGATAGC GATCGACCGC GCCGGTCGGT AAACGCCGCA CACGGCACTA TCAATGCGCA 660 

CGGCGGGCGT TGATGCCAAA TTGACCGTCC CGACGGGGCT TTATCTGCGG CAAGATTTCA 720 

TCCCCAGCCC GGTCGGTGGG CCGATAAATA CGCTGGTCAG CGCGACTCTT CCGGCTGAAT 7 80 
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TCGATGCTCT GGGCGCCCGC TCGACGCCGA GTATCTCGAG TGGGC CGCAA ACCCGGTCAA 840 

ACGCTGTTAC TGTGGCGTTA CCACAGGTGA ATTTGCGGTG CCAACTGGTG AACACTTGCG 900 

AACGGGTGGC ATCGAAATCA ACTTGTTGCG TTGCAGTGAT CTACTCTCTT GCAGAGAGCC 960 

GTTGCTGGGA TTAATTGGGA GAGGAAGACA GCATGTCGTT CGTGACCACA CAGCCGGAAG 102 0 

CCCTGGCAGC TGCGGCGGCG AACCTACAGG GTATTGGCAC GACAATGAAC GCCCAGAACG 10 30 

CGGCCGCGGC TGGTCCAACC ACCGGAGTAG TGCCCGCAGC CGCCGATGAA GTATCAGCGC 114 0 

TGACCGCGGC TCAGTTTGCT GCGCACGCGC AGATGTACCA AACGGTCAGC GCCCAGGCCG 12 0 0 

CGGCCATTCA CGAAATGTTC GTGAACACGC TGGTGGCCAG TTCTGGCTCA TACGCGGCCA 12 6 0 

CCGAGGCGGC CAACGCAGCC GCTGCCGGCT GAACGGGCTC GCACGAACCT GCTGAAGGAG 13 2 0 

AGGGGGAACA TCCGGAGTTC TCGGGTCAGG GGTTGCGCCA GCGCCCAGCC GATTCAGCTA 13 3 0 

TCGGCGTCCA TAACAGCAGA CGATCTAGGC ATTCAGTACT AAGGAGACAG GCAACATGGC 1440 

CTCACGTTTT ATGACGGATC CGCATGCGAT GCGGGACATG GCGGGCCGTT TTGAGGTGCA 1500 

CGCCCAGACG GTGGAGGACG AGGCTCGCCG GATGTGGGCG TCCGCGCAAA ACATTTCGGG 15 60 

TGCGGGCTGG AGTGGCATGG CCGAGGCGAC CTCGCTAGAC ACCATGACCT AGATGAATCA 162 0 

GGCGTTTCGC AACATCGTGA ACATGCTGCA CGGGGTGCGT GACGGGCTGG TTCGCGACGC 1630 

CAACAACTAC GAACAGCAAG AGCAGGCGTC CCAGCAGATC CTGAGCAGCT AGCGCCGAAA 1740 

GCCACAGCTG CGTACGCTTT CTCACATTAG GAGAACACCA ATATGACGAT TAATTACCAG 13 00 

TTCGGGGACG TCGACGCTCA TGGCGCCATG ATCCGCGCTC AGGCGGCGTC GCTTGAGGCG 13 60 

GAGCATCAGG CCATCGTTCG TGATGTGTTG GCCGCGGGTG ACTTTTGGGG CGGCGCCGGT 192 0 

TCGGTGGCTT GCCAGGAGTT CATTAC CCAG TTGGGCGGTA ACTTCCAGGT GATCTACGAG 1930 

CAGGCCAACG CCCACGGGCA GAAGGTGCAG GCTGCCGGCA ACAACATGGC GCAAACCGAC 204 0 

AGCGCGGTCG GGTCCAGCTG GGCCTAAAAC TGAACTTCAG TCGCGGCAGC ACACCAACCA 2100 

GGCGGTGTGC TGCTGTGTCC TGCAGTTAAC TAGCACTCGA CCGCTGAGGT AGCGATGGAT 2160 

CAACAGAGTA CCCGCACCGA CATCACCGTC AACGTCGACG GCTTCTGGAT GCTTGAGGCG 2 22 0 

CTACTGGATA TCCGCCACGT TGCGCCTGAG TTACGTTGCC GGCCGTACGT CTCCACCGAT 2 2 80 

TC CAATG ACT GGGTAAACGA GCACCCGGGG ATGGCGGTCA TGCGCGAGCA GGGCATTGTC 2340 

GTCAACGACG CGGTCAACGA ACAGGTCGCT GCCCGGATGA AGGTGCTTGC CGCACCTGAT 24 00 

CTTGAAGTCG TCGCCCTGCT GTCACGCGGC AAGTTGCTGT ACGGGGT CAT AGACGACGAG 24 60 

AACCAGCCGC CGGGTTCGCG TGACATCCCT GACAATGAGT TCCGGGTGGT GTTGGCCCGG 2 52 0 

CGAGGC CAGC ACTGGGTGTC GGCGGTACGG GTTGGCAATG ACATCACCGT CGATGACGTG 25 80 

ACGGTCTCGG ATAGCGCCTC GATCGCCGCA CTGGTAATGG ACGGTCTGGA GTCGATTCAC 2 640 

CACGCCGACC CAGCCGCGAT CAACGCGGTC AACGTGCCAA TGGAGGAGAT CTCGTGCCGA 270 0 

ATTCGGCACG AGGCACGAGG CGGTGTCGGT GACGACGGGA TCGATCACGA TCATCGACCG 2760 

GCCGGGATCC TTGGCGATCT CGTTGAGCAC GACCCGGGCC CGCGGGAAGC TCTGCGACAT 2820 

CCATGGGTTC TTCCCG 283 6 



(2) INFORMATION" FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 94 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii), MOLECULE TYPE: peptide 

<vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mycobacterium tuberculosis 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

Met Thr lie Asn Tyr Gin Phe Gly Asp Val Asp Ala His Gly Ala Met 

15 10 15 

Ile f Arg Ala Leu Ala Gly Leu Leu Glu Ala Glu His Gin Ala lie lie 

20 25 30 

Ser Asp Val Leu Thr Ala Ser Asp Phe Trp Gly Gly Ala Gly Ser Ala 

35 40 45 



6 



Ala Cys Gin Gly Phe He 
50 

Tyr Glu Gin Ala Asn Ala 
55 70 
Asn Met Ala Gin Tar Asp 
85 



Thr Gin Leu Gly Arg 
55 

His Gly Gin Lys Val 
75 

Ser Ala Val Gly Ser 
90 



Asn Phe Gin Val He 
60 

Gin Ala Ala Gly Asn 
30 

Ser Trp Ala 



Mtb9.9A (MTI-A) ORF peptides 

(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mycobacterium tuberculosis 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 

Met Thr He Asn Tyr Gin Phe Gly Asp Val Asp Ala His Gly Ala 
15 10 is 

(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM ; Mycobacterium tuberculosis 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 

Gin Phe Gly Asp Val Asp Ala His Gly Ala Met He Arg Ala Gin 
15 10 15 

(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : peptide 
(vi) ORIGINAL SOURCE: 
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(A) ORGANISM: Mycobacterium tuberculosis 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 

Asp Ala His Gly Ala Met He Arg Ala Gin Ala Ala Ser Leu Glu 
15 10 15 

(2) INFORMATION FOR SEQ ID NO:54: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mycobacterium tuberculosis 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 

Met He Arg Ala Gin Ala Ala Ser Leu Glu Ala Glu His Gin Ala 
15 10 15 

(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mycobacterium tuberculosis 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 

Ala Ala Ser Leu Glu Ala Glu His Gin Ala He Val Arg Asp Val 
15 10 15 

(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mycobacterium tuberculosis 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
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Ala Glu His Gin Ala lie Val Arg Asp Val Leu Ala Ala Gly Asp 
! 5 10 ^ 

(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

( B ) TYPE : amino ac id 

(C) STRAND SDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mycobacterium tuberculosis 

{xi). SEQUENCE DESCRIPTION : SEQ ID NO: 57: 
He Val Arg Asp Val Leu Ala Ala Gly Asp Phe Trp Gly Gly Ala 
1 5 
(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: IS amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mycobacterium tuberculosis 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 

Leu Ala Ala Gly Asp Phe Trp Gly Gly Ala Gly Ser Val Ala Cys Gin 
1 5 
(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mycobacterium tuberculosis 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
Phe Trp Gly Gly Ala Gly Ser Val Ala Cys Gin Glu Phe He Thr 
1 5 
(2) INFORMATION FOR SEQ ID NO: 60: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 
(3) TYPE: amino acid 

(C) S TRAND EE NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mycobacterium tuberculosis 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:60; 

Gly Ser Val Ala Cys Gin Glu Phe lie Thr Gin Leu Gly Arg Asn 
15 10 is 

(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mycobacterium tuberculosis 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 

Gin Glu Phe lie Thr Gin Leu Gly Arg Asn Phe Gin Val lie Tyr Glu 
15 10 15 

Gin Ala 

(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mycobacterium tuberculosis 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 

Arg Asn Phe Gin Val lie Tyr Glu Gin Ala Asn Ala His Gly Gin 
15 10 is 

(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 
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(3) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mycobacterium tuberculosis 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : S3 : 

lie Tyr Glu Gin Ala Asn Ala His Gly Gin Lys Val Gin Ala Ala 
15 10 is 

(2) INFORMATION FOR SEQ ID NO : 64 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: IS amino acids 

(B) TYPE: amino acid 

(C) STRAND SDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : peptide 

(vi) ORIGINAL SOURCE: 

{A) ORGANISM: Mycobacterium tuberculosis 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 64 : 

Asn Ala His Gly Gin Lys Val Gin Ala Ala Gly Asn Asn Met Ala 
IS 10 is 

(2) INFORMATION FOR SEQ ID NO : 65 : 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mycobacterium tuberculosis 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 

Lys Val Gin Ala Ala Gly Asn Asn Met Ala Gin Thx Asp Ser Ala 
15 10 15 

(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: peptide 

(vi) ORIGINAL S0URC2 : 

(A) ORGANISM: Mycobacterium tuberculosis 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 

Gly Asn Asn Met Ala Gin Thr Asp Ser Ala Val Gly Ser Ser Trp Ala 
15 10 15 



Mtb9.8 (MSL) 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 585 base pairs 
(3) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mycobacterium tuberculosis 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 12 : 

TGGATTCCGA TAGCGGTTTC GGCCCCTCGA CGGGCGACCA CGGCGCGCAG GCCTCCGAAC 60 

GGGGGGCCGG GACGCTGGGA TTCGCCGGGA CCGCAACCAA AGAACGC CGG GTCCGGGCGG 120 

TCGGGCTGAC CGCACTGGCC GGTGATGAGT TCGGCAACGG CCCCCGGATG CCGATGGTGC 130 

CGGGGACCTG GGAGCAGGGC AGCAACGAGC CCGAGGCGCC CGACGGATCG GGGAGAGGGG 24 0 

GAGGCGACGG CTTACCGCAC GACAGCAAGT AACCGAATTC CGAATCACGT GGACCCGTAC 3 00 

GGGTCGAAAG GAGAGATGTT ATGAGCCTTT TGGATGCTCA TATCCCACAG TTGGTGGCCT 3 60 

CCCAGTCGGC GTTTGCCGCC AAGGCGGGGC TGATGCGGCA CACGATCGGT CAGGCCGAGC 420 

AGGCGGCGAT GTCGGCTCAG GCGTTTCACC AGGGGGAGTC GTCGGCGGCG TTTCAGGCCG 480 

CCCATGCCCG GTTTGTGGCG GCGGCCGCCA AAGTCAACAC CTTGTTGGAT GTCGCGCAGG 54 0 

CGAATCTGGG TGAGGCCGCC GGTACCTATG TGGCCGCCGA TGCTG 585 



(2) INFORMATION FOR SEQ ID NO: 10 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 97 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 109: 

Met Ser Leu Leu Asp Ala His lie Pro Gin Leu Val Ala Ser Gin Ser 

1 5 10 15 

Ala Phe Ala Ala Lys Ala Gly Leu Met Arg His Thr lie Gly Gin Ala 

20 25 30 

Glu Gin Ala Ala Met Ser Ala Gin Ala Phe His Gin Gly Glu Ser Ser 
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35 

Ala Ala Phe Gin Ala 
50 

Val Asn Thr Leu Leu 
65 

Gly Thr Tyr Val Ala 
85 

Phe 



40 

Ala His Ala Arg Pha Val 
55 

Asp Val Ala Gin Ala Asn 
70 75 
Ala Asp Ala Ala Ala Ala 
50 



45 

Ala Ala Ala Ala Lys 
60 

Leu Gly Glu Ala Ala 
30 

Ser Thr Tyr Thr Gly 
95 



Mtb9.8 ORF peptides 

(2) INFORMATION FOR SEQ ID NO: 110: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

Cii) MOLECULE TYPE: peptide 

{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 110: 

Met Ser Leu Leu Asp Ala His lie Pro Gin Leu Val Ala Ser Gin 
15 10 is 

(2) INFORMATION FOR SEQ ID NO: 111: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
CD) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 111: 

Ala His He Pro Gin Leu Val Ala Ser Gin Ser Ala Phe Ala Ala 
1 5 10 15 

(2) INFORMATION FOR SEQ ID NO : 112 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 112: 

Leu Val Ala Ser Gin Ser Ala Phe Ala Ala Lys Ala Gly Leu Met 
1 5 10 15 
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(2) INFORMATION FOR SEQ ID NO: 113: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 113: 

Ser Ala Phe Ala Ala Lys Ala Gly Leu Met Arg His Thr lie Gly 
1 5 10 15 

(2) INFORMATION FOR SEQ ID NO: 114: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 114: 

Lys Ala Gly Leu Met Arg His Thr lie Gly Gin Ala Glu Gin Ala 
1 5 10 is 

(2) INFORMATION FOR SEQ ID NO: 115: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 115: 

Arg His Thr lie Gly Gin Ala Glu Gin Ala Ala Met Ser Ala Gin 
1 5 10 is 

(2) INFORMATION FOR SEQ ID NO : 116 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 116: 
Gin Ala Glu Gin Ala Ala Met Ser Ala Gin Ala Phe His Gin Gly 
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5 



10 



15 



(2) INFORMATION FOR SEQ ID NO: 117: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 117: 

Ala Met Ser Ala Gin Ala Phe His Gin Gly Glu Ser Ser Ala Ala 
15 10 is 

(2) INFORMATION FOR SEQ ID NO: 118: 

Ci) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 113: 

Ala Phe His Gin Gly Glu Ser Ser Ala Ala Phe Gin Ala Ala His 
15 10 is 

(2) INFORMATION FOR SEQ ID NO: 119: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 119: 

Glu Ser Ser Ala Ala Phe Gin Ala Ala His Ala Arg Phe Val Ala 
15 10 is 

(2) INFORMATION FOR SEQ ID NO: 120: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 120: 
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Phe Gin Ala Ala His Ala Arg Phe Val Ala Ala Ala Ala Lys Val 
15 10 is 



(2) INFORMATION FOR SEQ ID NO: 121: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single " 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO : 121: 

Ala Arg Phe Val Ala Ala Ala Ala Lys Val Asn Thr Leu Leu Asp 
1 5 10 15 

(2} INFORMATION FOR SEQ ID NO: 122: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO:122: 

Ala Ala Ala Lys Val Asn Thr Leu Leu Asp Val Ala Gin Ala Asn 
15 10 15 

(2) INFORMATION FOR SEQ ID NO: 123: 

(i) SEQUENCE CHARACTERISTICS: 
{A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : peptide 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 123: 

Asn Thr Leu Leu Asp Val Ala Gin Ala Asn Leu Gly Glu Ala Ala 
1 5 10 is 

(2) INFORMATION FOR SEQ ID NO: 124: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
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(xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 124: 

Val Ala Gin Ala Asn Leu Gly Glu Ala Ala Gly Thr Tyr Val Ala Ala 

15 10 15 

Asp Ala 



Mtb39A (TbH9) 

(2) INFORMATION FOR SEQ ID NO: 106: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 3053 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 10 6: 

GATCGTACCC GTGCGAGTGC TCGGGCCGTT TGAGGATGGA GTGCACGTGT CTTTCGTGAT 60 

GGCATACCCA GAGATGTTGG CGGCGGCGGC TGACACCCTG CAGAGCATCG GTGCTACCAC 120 

TGTGGCTAGC AATGCCGCTG CGGCGGCCCC GACGACTGGG GTGGTGCCCC CCGCTGCCGA 13 0 

TGAGGTGTCG GCGCTGACTG CGGCGCACTT CGCCGCACAT GCGGCGATGT ATCAGTCCGT 24 0 

GAGCGCTCGG GCTGCTGCGA TTCATGACCA GTTCGTGGCC ACCCTTGCCA GCAGCGCCAG 3 00 

CTCGTATGCG GCCACTGAAG TCGCCAATGC GGCGGCGGCC AGCTAAGC C A GGAACAGTCG 3 60 

GCACGAGAAA CCACGAGAAA TAGGGACACG TAATGGTGGA TTTCGGGGCG TTACCACCGG 42 0 

AGATCAACTC CGCGAGGATG TACGCCGGCC CGGGTTCGGC CTCGCTGGTG GCCGCGGCTC 4 80 

AGATGTGGGA CAGCGTGGCG AGTGAC CTGT TTTCGGCCGC GTCGGCGTTT CAGTCGGTGG 54 0 

TCTGGGGTCT GACGGTGGGG TCGTGGATAG GTTCGTCGGC GGGT CTGATG GTGGCGGCGG 60 0 

CCTCGCCGTA TGTGGCGTGG ATGAGCGTCA CCGCGGGGCA GGCCGAGCTG ACCGCCGCCC 660 

AGGTCCGGGT TGCTGCGGCG GCCTACGAGA CGGCGTATGG GCTGACGGTG CCCCCGCCGG 720 

TGATCGCCGA GAACCGTGCT GAACTGATGA TTCTGATAGC GACCAACCTC TTGGGGCAAA. 780 

ACACCCCGGC GATCGCGGTC AACGAGGCCG AATACGGCGA GATGTGGGCC CAAGACGCCG 84 0 

CCGCGATGTT TGGCTACGCC GCGGCGACGG CGACGGCGAC GGCGACGTTG CTGCCGTTCG 900 

AGGAGGCGCC GGAGATGACC AGCGCGGGTG GGCTCCTCGA GCAGGCCGCC GCGGTCGAGG 960 

AGGCCTCCGA CACCGCCGCG GCGAACCAGT TGATGAACAA TGTGCCCCAG GCGCTGCAAC 102 0 

AGCTGGCCCA GCCCACGCAG GGCACCACGC CTTCTTCCAA GCTGGGTGGC CTGTGGAAGA 1080 

CGGTCTCGCC GCATCGGTCG CCGATCAGCA ACATGGTGTC GATGGCCAAC AACCACATGT 114 0 

CGATGACCAA CTCGGGTGTG TCGATGACCA ACACCTTGAG CTCGATGTTG AAGGGCTTTG 1200 

CTCCGGCGGC GGCCGCCCAG GCCGTGCAAA CCGCGGCGCA AAACGGGGTC CGGGCGATGA 1260 

GCTCGCTGGG CAGCTCGCTG GGTTCTTCGG GTCTGGGCGG TGGGGTGGCC GCCAACTTGG 13 2 0 

GTCGGGCGGC CTCGGTCGGT TCGTTGTCGG TGCCGCAGGC CTGGGCCGCG GCCAACCAGG 13 8 0 

CAGTCACCCC GGCGGCGCGG GCGCTGCCGC TGACCAGCCT GACCAGCGCC GCGGAAAGAG 1440 

GGCCCGGGCA GATGCTGGGC GGGCTGCCGG TGGGGCAGAT GGGCGCCAGG GCCGGTGGTG 150 0 

GGCTCAGTGG TGTGCTGCGT GTTCCGCCGC GACCCTATGT GATGCCGCAT TCTCCGGCGG 1560 

CCGGCTAGGA GAGGGGGCGC AGACTGTCGT TATTTGACCA GTGAT CGGCG GTCTCGGTGT 1620 

TTCCGCGGCC GGCTATGACA ACAGTCAATG TGCATGACAA GTTACAGGTA TTAGGTCCAG 1680 

GTTCAACAAG GAGACAGGCA ACATGGCCTC ACGTTTTATG ACGGATCCGC ACGCGATGCG 1740 

GGACATGGCG GGCCGTTTTG AGGTGCACGC CCAGACGGTG GAGGACGAGG CTCGCCGGAT 1300 

GTGGGCGTCC GCGCAAAACA TTTCCGGTGC GGGCTGGAGT GGCATGGCCG AGGCGACCTC 1860 

GCTAGACACC ATGGCCCAGA TGAATCAGGC GTTTCGCAAC ATCGTGAACA TGCTGCACGG 1920 

GGTGCGTGAC GGGCTGGTTC GCGACGCCAA CAACTACGAG CAGCAAGAGC AGGCCTCCGA 1980 

GCAGATCCTC AGCAGCTAAC GTCAGCCGCT GCAGCACAAT ACTTTTACAA GCGAAGGAGA 2040 

ACAGGTTCGA TGACCATCAA CTATCAATTC GGGGATGTCG ACGCTCACGG CGCCATGATC 2100 

CGCGCTCAGG CCGGGTTGCT GGAGGCCGAG CATCAGGCCA TCATTCGTGA TGTGTTGACC 2160 

GCGAGTGACT TTTGGGGCGG CGCCGGTTCG GCGGCCTGCC AGGGGTTCAT TACCCAGTTG 2220 
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GGCCGTAACT TCCAGGTGAT CTACGAGCAG GCCAACGCCC ACGGGCAGAA GGTGCAGGCT 22 3 0 

GCCGGCAACA ACATGGCGCA AACCGACAGC GCCGTCGGCT CCAGCTGGGC CTGACACCAG 2340 

GCCAAGGCCA GGGACGTGGT GTACGAGTGA AGTTCCTCGC GTGATCCTTC GGGTGGCAGT 240 0 

CTAAGTGGTC AGTGCTGGGG TGTTGGTGGT TTGCTGCTTG GCGGGTTCTT CGGTGCTGGT 2450 

CAGTGCTGCT CGGGCTCGGG TGAGGACCTC GAGGCGCAGG TAGCGCCGTC CTTCGATCCA 2S20 

TTCGTCGTGT TGTTCGGCGA GGACGGCTCC GACGAGGCGG ATGATCGAGG CGCGGTCGGG 2530 

GAAGATGCCC ACGACGTCGG TTCGGCGTCG TACCTCTCGG TTGAGGCGTT CCTGGGGGTT 2 64 0 

GTTGGACCAG ATTTGGCGCC AGATCTGCTT GGGGAAGGCG GTGAACGCCA GCAGGTCGGT 2 70 0 

GCGGGCGGTG TCGAGGTGCT CGGCCACCGC GGGGAGTTTG TCGGTCAGAG CGTCGAGTAC 2 750 

CCGATCATAT TGGGCAACAA CTGATTCGGC GTCGGGCTGG TCGTAGATGG AGTGCAGCAG 2320 

GGTGCGCACC CACGGCCAGG AGGGCTTCGG GGTGGCTGCC ATCAGATTGG CTGCGTAGTG 2 83 0 

GGTTCTGCAG CGCTGCCAGG CCGCTGCGGG CAGGGTGGCG CCGATCGCGG CCACCAGGCC 2 94 0 

GGCGTGGGCG TCGCTGGTGA CCAGGGCGAC CCCGGACAGG CCGCGGGCGA CCAGGTCGCG 3 000 

GAAGAACGCC AGCCAGCCGG CCCCGTCCTC GGCGGAGGTG ACCTGGATGC CCAGGATC 3 OS 8 



(2) INFORMATION FOR SEQ ID NO: 107: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 91 amino acids 

(B) TYPE: amino acid 

(C) STRAND SDNESS : single 
(jD) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 107: 

Met Val Asp Phe Gly Ala Leu Pro Pro Glu lie Asn Ser Ala Arg Met 
1 5 10 is 

Tyr Ala Gly Pro Gly Ser Ala Ser Leu Val Ala Ala Ala Gin Met Trp 
20 25 30 

Asp Ser Val Ala Ser Asp Leu Phe Ser Ala Ala Ser Ala Phe Gin Ser 
35 40 45 

Val Val Trp Gly Leu Thr Val Gly Ser Trp lie Gly Ser Ser Ala Gly 
50 55 50 

Leu Met Val Ala Ala Ala Ser Pro Tyr Val Ala Trp Met Ser Val Thr 
65 70 75 80 

Ala Gly Gin Ala Glu Leu Thr Ala Ala Gin Val Arg Val Ala Ala Ala 
85 90 95 

Ala Tyr Glu Thr Ala Tyr Gly Leu Thr Val Pro Pro Pro Val He Ala 
100 105 no 

Glu Asn Arg Ala Glu Leu Met He Leu He Ala Thr Asn Leu Leu Gly 
115 120 125 

Gin Asn Thr Pro Ala He Ala Val Asn Glu Ala Glu Tyr Gly Glu Met 
130 135 140 

Trp Ala Gin Asp Ala Ala Ala Met Phe Gly Tyr Ala Ala Ala Thr Ala 
145 150 155 ISO 

Thr Ala Thr Ala Thr Leu Leu Pro Phe Glu Glu Ala Pro Glu Met Thr 
165 170 175 



13 



Ser Ala Gly Gly Leu Leu Glu Gin Ala Ala Ala Val Glu Glu Ala Ser 
180 135 190 



Asp Thr Ala Ala Ala Asn Gin Leu Met Asri Asn Val Pro Gin Ala Leu 
195 200 205 

Gin Gin Leu Ala Gin Pro Thr Gin Gly Thr Thr Pro Ser Ser Lys Leu 
210 215 220 

Gly Gly Leu Trp Lys Thr Val Ser Pro His Arg Ser Pro lie Ser Asn 
225 230 235 240 

Met Val Ser Met Ala Asn Asn His Met Ser Met Thr Asn Ser Gly Val 
245 250 255 

Ser Met Thr Asn Thr Leu Ser Ser Met Leu Lys Gly Phe Ala Pro Ala 
260 265 270 

Ala Ala Ala Gin Ala Val Gin Thr Ala Ala Gin Asn Gly Val Arg Ala 
275 230 235 

Met Ser Ser Leu Gly Ser Ser Leu Gly Ser Ser Gly Leu Gly Gly Gly 
290 29S 300 

Val Ala Ala Asn Leu Gly Arg Ala Ala Ser Val Gly Ser Leu Ser Val 
305 310 315 320 

Pro Gin Ala Trp Ala Ala Ala Asn Gin Ala Val Thr Pro Ala Ala Arg 
325 330 335 

Ala Leu Pro Leu Thr Ser Leu Thr Ser Ala Ala Glu Arg Gly Pro Gly 
340 345 350 

Gin Met Leu Gly Gly Leu Pro Val Gly Gin Met Gly Ala Arg Ala Gly 
355 360 365 

Gly Gly Leu Ser Gly Val Leu Arg Val Pro Pro Arg Pro Tyr Val Met 
370 375 330 

Pro His Ser Pro Ala Ala Gly 
385 390 



Mtb32A (TbRa35) 

(2) INFORMATION FOR SEQ ID NO : 17 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1872 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
GACTACGTTG GTGTAGAAAA ATCCTGCCGC CCGGACCCTT AAGGCTGGGA CAATTTCTGA 
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TAGCTACCCC GACACAGGAG GTTACGGGAT GAGCAATTCG CGGCGCCGCT CACTCAGGTG 12 Q 

GTCATGGTTG CTGAGCGTGC TGGCTGCCGT CGGGCTGGGC CTGGCCACGG CGCCGGCCCA 130 

GGCGGCCCCG CCGGCCTTGT CGCAGGACCG GTTCGCCGAC TTCCCCGCGC TGCCCCTCGA 240 

CCCGTCCGCG ATGGTCGCCC AAGTGGCGCC ACAGGTGGTC AACATCAACA CCAAACTGGG 3 00 

CTACAACAAC GCCGTGGGCG CCGGGACCGG CAT CGT CATC GATCCCAACG GTGTCGTGCT 360 

GACCAACAAC CACGTGATCG CGGGCGCCAC CGACATCAAT GCGTTCAGCG TCGGCTCCGG 42 0 

CCAAACCTAC GGCGTCGATG TGGTCGGGTA TGACCGCACC CAGGATGTCG CGGTGCTGCA 4 30 

GCTGCGCGGT GCCGGTGGCC TGCCGTCGGC GGCGATCGGT GGCGGCGTCG CGGTTGGTGA 540 

GCCCGTCGTC GCGATGGGCA ACAGCGGTGG GCAGGGCGGA ACGCCCCGTG CGGTGCCTGG 50 0 

CAGGGTGGTC GCGCTCGGCC AAACCGTGCA GGCGTCGGAT TCGCTGACCG GTGCCGAAGA 660 

GACATTGAAC GGGTTGATCC AGTTCGATGC CGCAATCCAG CCCGGTGATT CGGGCGGGCC 720 

CGTCGTCAAC GGCCTAGGAC AGGTGGTCGG * TATGAACACG GCCGCGTCCG ATAACTTCCA 780 

GCTGTCCCAG GGTGGGCAGG GATTCGCCAT TCCGATCGGG CAGGCGATGG CGATCGCGGG 84 0 

CCAAATCCGA TCGGGTGGGG GGTCACCCAC CGTTCATATC GGGCCTACCG CCTTCCTCGG 90 0 

CTTGGGTGTT GTCGACAACA ACGGCAACGG CGCACGAGTC CAACGCGTGG TCGGAAGCGC 960 

TCCGGCGGCA AGTCTCGGCA TCTCCAGCGG CGACGTGATC ACCGCGGTCG ACGGCGCTCC 102 0 

GATCAACTCG GCCACCGCGA TGGCGGACGC GCTTAACGGG CATCATCCCG GTGACGTCAT 10 80 

CTCGGTGAAC TGGCAAACCA AGTCGGGCGG CACGCGTACA GGGAACGTGA CATTGGCCGA 1140 

GGGACCCCCG GCCTGATTTG TCGCGGATAC CACCCGCCGG CCGGCCAATT GGATTGGCGC 12 00 

CAGCCGTGAT TGCCGCGTGA GCCCCCGAGT TCCGTCTCCC GTGCGCGTGG CATTGTGGAA 12 60 

GCAATGAACG AGGCAGAACA CAGCGTTGAG CACCCTCCCG TGCAGGGCAG TTACGTCGAA 13 20 

GGCGGTGTGG TCGAGCATCC GGATGCCAAG GACTTCGGCA GCGCCGCCGC CCTGCCCGCC 13 8 0 

GATCCGACCT GGTTTAAGCA CGCCGTCTTC TACGAGGTGC TGGTCCGGGC GTTCTTCGAC 1440 

GCCAGCGCGG ACGGTTCCGN CGATCTGCGT GGACTCATCG ATCGCCTCGA CTACCTGCAG IS 0 0 

TGGCTTGGCA TCGACTGCAT CTGTTGCCGC CGTTCCTACG ACTCACCGCT GCGCGACGGC 15 60 

GGTTACGACA TTCGCGACTT CTACAAGGTG CTGCCCGAAT TCGGCACCGT CGACGATTTC 1620 

GTCGCCCTGG TCGACACCGC TCACCGGCGA GGTATCCGCA TCATCACCGA CCTGGTGATG 1680 

AATCACACCT CGGAGTCGCA CCCCTGGTTT CAGGAGTCCC GCCGCGACCC AGACGGACCG 174 0 

TACGGTGACT ATTACGTGTG GAGCGACACC AGCGAGCGCT ACACCGACGC CCGGATCATC 1800 

TTCGTCGACA CCGAAGAGTC GAACTGGTCA TTCGATCCTG TCCGCCGACA GTTNCTACTG 13 60 

GCACCGATTC TT 1872 



(2) INFORMATION FOR SEQ ID NO: 79: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 355 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79: 



Met 


Ser 


Asn 


Ser 


Arg 


Arg 


Arg 


Ser 


Leu Arg 


Trp 


Ser 


Trp Leu 


Leu Ser 


1 








5 










10 








15 


Val 


Leu 


Ala 


Ala 


Val 


Gly 


Leu 


Gly 


Leu 


Ala 


Thr 


Ala 


Pro Ala 


Gin Ala 








20 










25 








30 




Ala 


Pro 


Pro 


Ala 


Leu 


Ser 


Gin 


Asp 


Arg 


Phe 


Ala 


Asp 


Phe Pro 


Ala Leu 






35 










40 










45 




Pro 


Leu 


Asp 


Pro 


Ser 


Ala 


Met 


Val 


Ala 


Gin 


Val 


Ala 


Pro Gin 


Val Val 




50 










55 










60 






Asn 


He 


Asn 


Thr 


Lys 


Leu 


Gly 


Tyr 


Asn 


Asn 


Ala 


Val 


Gly Ala 


Gly Thr 


65 










70 










75 






80 


Gly 


He 


Val 


He 


Asp 


Pro 


Asn 


Gly 


Val 


Val 


Leu 


Thr 


Asn Asn 


His Val 










85 










90 








95 


He 


Ala 


Gly 


Ala 


Thr 


Asp 


He 


Asn 


Ala 


Phe 


Ser 


Val 


Gly Ser Gly Gin 








100 










105 








110 




Thr 


Tyr 


Gly 


Val 


Asp 


Val 


Val 


Gly 


Tyr 


Asp 


Arg 


Thr 


Gin Asp 


Val Ala 



20 







115 








120 








125 






Val 


Leu 


Gin 


Leu 


Arq 


Gly Ala Gly 


Gly Leu 


Pro Ser 


Ala Ala 




vjx y 




130 










135 






140 






Glv 


Glv 


Val 


Ala 


Val 


Gly 


Glu Pro 


Val 


Val 


Ala Met 


wxy nbLi 


Ser 


1 t r 

(jiy 


145 










150 








155 






lb U 




Gin 


Glv 


Glv 


Thr 
165 


Pro 


Arg Ala 


Val 


Pro 
170 


Gly Arg 


VcLX val 


ax a 

x / 3 


Leu 


Glv 


Gin 


Thr 


Val 
180 


Gin 


Ala 


Ser Asp 


Ser 
185 


Leu 


Thr Gly 


A4.a tjj.u 


Glu 


Thr 


Leu 


Asn 


m v 


Leu 


J. X c 


Gin 


Phe Asp 


Ala 


Ala 


He Gin 


Pro Gly 


Asp 


Ser 






195 








200 












Glv 
^xy 


Gly 


JT X W 


v dii 


V SIX 


Asn Gly Leu 


Gly Gin Val Val 


/*** 1 » f Jut — b> 

tj-xy Met 


Asn 


Thr 




210 










215 






220 








Ala 


Ala 


Ser 


As 


Asn 


Phe 


Gin Leu 


Ser 


Gin 


Gly Gly 




Phe 


Ala 


225 










230 








23S 






240 


lie 


Pro 


IXC 


Gly 


VJX^tl 


Ala 


Met Ala 


lie 


Ala 


Gly Gin 


-i-ie Arg 


Ser Gly 










245 








250 






255 




Gly 


m v 

ui y 


Car 


ir X CJ 


Thr 


Val 


His lie 


Gly 


Pro 


Thr Ala 


rue -Ueu 


Gly Leu 








260 








265 






270 






m \r 


Va 1 
v ax 


val 


As j) 


Asn 


Asn Gly Asn 


Gly Ala Arg Val 


Gin Arg 


Val 


Val 






275 








230 








2S5 






uiy 


Ser 

230 


Ax 3. 


Pro 


A±a 


Ala 


Ser Leu 
295 


Gly 


lie 


Ser Thr 
300 


Gly Asp 


Val 


He 


Thr 


Ala 


Val 


Asp 


Gly 


Ala 


Pro lie 


Asn 


Ser 


Ala Thr 


Ala Met 


Ala 


Asp 


305 










310 








315 






320 


Ala 


Leu 


Asn 


Gly 


His 
325 


His 


Pro Gly 


Asp 


Val 
330 


He Ser 


Val Asn 


Trp 
335 


Gin 


Thr 


Lys 


Ser 


Gly 
340 


Gly 


Thr 


Arg Thr 


Gly 
345 


Asn 


Val Thr 


Leu Ala 
350 


Glu 


Gly 



Pro Pro Ala 
355 



Mtb8.4 (DPV) 

(2) INFORMATION FOR SEQ ID NO: 101: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 500 base pairs 

(B) TYPE; nucleic acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 101; 

CGTGGCAATG TCGTTGACCG TCGGGGCCGG GGTCGCCTCC GCAGATCCCG TGGACGCGGT 60 

CATTAACACC ACCTGCAATT ACGGGCAGGT AGTAGCTGCG CTCAACGCGA CGGATCCGGG 120 

GGCTGCCGCA CAGTTCAACG CCTCACCGGT GGCGCAGTCC TATTTGCGCA ATTTCCTCGC 130 

CGCACCGCCA CCTCAGCGCG CTGCCATGGC CGCGCAATTG CAAGCTGTGC CGGGGGCGGC 24 0 

ACAGTACATC GGCCTTGTCG AGTCGGTTGC CGGCTCCTGC AACAACTATT AAGCCCATGC 300 

GGGCCCCATC CCGCGACCCG GCATCGTCGC CGGGGCTAGG CCAGATTGCC CCGCTCCTCA 3 60 

ACGGGCCGCA TCCCGCGACC CGGCATCGTC GCCGGGGCTA GGCCAGATTG CCCCGCTCCT 420 

CAACGGGCCG CATCTCGTGC CGAATTCCTG CAGCCCGGGG GATCCACTAG TTCTAGAGCG 4 80 

GCCGCCACCG CGGTGGAGCT 50 0 



(2) INFORMATION FOR SEQ ID NO: 102: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 amino acids 
(3) TYPE: amino acid 
(C) STRANDEDNESS : single 
(Q) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1Q2: 



Val 


Ala 


Mec 


Ser Leu Thr Val Gly Ala 


Gly Val 


Ala 


Ser 


Ala Asp 


Pro 


1 






5 


10 








15 




Vai 


Asp 


Ala 


Val lie Asn Thr Thr Cys 


Asn 


Tyr 


Gly Gin Val Val Ala 








20 25 










30 




Ala 


Leu 


Asn 


Ala Thr Asp Pro Gly Ala Ala 


Ala 


Gin 


Phe 


Asn Ala 


Ser 






35 


40 








45 






Pro 


Val 


Ala 


Gin Ser Tyr Leu Arg Asn 


Phe 


Leu 


Ala 


Ala 


Pro Pro 


Pro 




50 




55 






SO 








Gin 


Arg 


Ala 


Ala Met Ala Ala Gin Leu 


Gin 


Ala 


Val 


Pro 


Gly Ala 


Ala 


65 






70 




75 






30 


Gin 


Tyr 


lie 


Gly Leu Val Glu Ser Val 


Ala 


Gly 


Ser 


Cys 


Asn Asn 


Tyr 








35 


90 








95 





Mtbll (Tb33-1) 

(2) INFORMATION FOR SEQ ID NO: 46: 

(i) ■ SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
CD) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 

CGGCACGAGA GACCGATGCC GCTACCCTCG CGCAGGAGGC AGGTAATTTC GAGCGGATCT 60 

CCGGCGACCT GAAAACCCAG ATCGACCAGG TGGAGTCGAC GGCAGGTTCG TTGCAGGGCC 120 

AGTGGCGCGG CGCGGCGGGG ACGGCCGCCC AGGCCGCGGT GGTGCGCTTC CAAGAAGCAG 180 

CCAATAAGCA GAAGCAGGAA CTCGACGAGA TCTCGACGAA TATTCGTCAG GCCGGCGTCC 240 

AATACTCGAG GGCCGACGAG GAGCAGCAGC AGGCGCTGTC CTCGCAAATG GGCTTCTGAC 3 00 

CCGCTAATAC GAAAAGAAAC GGAGCAA 327 



(2) INFORMATION FOR SEQ ID NO: 83: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 95 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 83: 

Thr Asp Ala Ala Thr Leu Ala Gin Glu Ala Gly Asn Phe Glu Arg lie 

15 10 15 

Ser Gly Asp Leu Lys Thr Gin lie Asp Gin Val Glu Ser Thr Ala Gly 
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20 25 30 

Ser Leu Gin Gly Gin Trp Arg Gly Ala Ala Gly Thr Ala Ala Gin Ala 

35 40 45 

Ala Val Val Arg Phe Gin Glu Ala Ala Asn Lys Gin Lys Gin Glu Leu 

50 55 60 

Asp Glu lie Ser Thr Asn lie Arg Gin Ala Gly Val Gin Tyr Ser Arg 
65 70 75 3Q 

Ala Asp Glu Glu Gin Gin Gin Ala Leu Ser Ser Gin Met Gly Phe 
35 90 95 



TbRa3 



(2) INFORMATION FOR SEQ ID NO: 15: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 542 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:15: 



GAATTCGGCA CGAGAGGTGA TCGACATCAT CGGGACCAGC CCCACATCCT GGGAACAGGC 60 

GGCGGCGGAG GCGGTCCAGC GGGCGCGGGA TAGCGTCGAT GACATCCGCG TCGCTCGGGT 120 

CATTGAGCAG GACATGGCCG TGGACAGCGC CGGCAAGATC ACCTACCGCA TCAAGCTCGA 13 0 

AGTGTCGTTC AAGATGAGGC CGGCGCAACC GCGCTAGCAC GGGCCGGCGA GCAAGACGCA 240 

AAATCGCACG GTTTGCGGTT GATTCGTGCG ATTTTGTGTC TGCTCGCCGA GGCCTACCAG 3 00 

GCGCGGCCCA GGTCCGCGTG CTGCCGTATC CAGGCGTGCA TCGCGATTCC GGCGGCCACG 3 60 

CCGGAGTTAA TGCTTCGCGT CGACCCGAAC TGGGCGATCC GCCGGNGAGC TG AT CGATGA 420 

CCGTGGCCAG CCCGTCGATG CCCGAGTTGC CCGAGGAAAC GTGCTGCCAG GCCGGTAGGA 48 0 

AGCGTCCGTA GGCGGCGGTG CTGACCGGCT CTGCCTGCGC CCTCAGTGCG GCCAGCGAGC 54 0 

542 



(2) INFORMATION FOR SEQ ID NO: 77: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 66 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 



Val lie Asp He 
1 

Ala Glu Ala Val 

20 

Ala Arg Val He 
35 

Thr Tyr Arg He 
50 

Pro Arg 
65 



He Gly Thr Ser 
5 

Gin Arg Ala Arg 

Glu Gin Asp Met 
40 

Lys Leu Glu Val 
55 



Pro Thr Ser Trp 
10 

Asp Ser Val Asp 
25 

Ala Val Asp Ser 

Ser Phe Lys Met 
60 



Glu Gin Ala Ala 
15 

Asp He Arg Val 
30 

Ala Gly Lys He 
45 

Arg Pro Ala Gin 
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38kD 



(2) INFORMATION FOR SEQ ID NO: 154: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1993 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY; linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 154; 

TGTTCTTCGA CGGCAGGCTG GTGGAGGAAG GGCCCACCGA ACAGCTGTTC TCCTCGCCGA 60 

AGCATGCGGA AACCGCCCGA TACGTCGCCG GACTGTCGGG GGACGTCAAG GACGCCAAGC 12 0 

GCGGAAATTG AAGAGCACAG AAAGGTATGG CGTGAAAATT CGTTTGCATA CGCTGTTGGC 13 0 

CGTGTTGACC GCTGCGCCGC TGCTGCTAGC AGCGGCGGGC TGTGGCTCGA AACCACCGAG 240 

CGGTTCGCCT GAAACGGGCG CCGGCGCCGG TACTGTCGCG ACTACCCCCG CGTCGTCGCC 3 00 

GGTGACGTTG GCGGAGACCG GTAGCACGCT GCTCTACCCG CTGTTCAACC TGTGGGGTCC 3 60 

GGCCTTTCAC GAGAGGTATC CGAACGTCAC GATCACCGCT CAGGGCACCG GTTCTGGTGC 42 0 

CGGGATCGCG CAGGCCGCCG CCGGGACGGT CAACATTGGG GCCTCCGACG CCTATCTGTC 4 80 

GGAAGGTGAT ATGGCCGCGC ACAAGGGGCT GATGAACATC GCGCTAGCCA TCTCCGCTCA 540 

GCAGGTCAAC TACAACCTGC CCGGAGTGAG CGAGCACCTC AAGCTGAACG GAAAAGTCCT 600 

GGCGGCCATG TACCAGGGCA CCATCAAAAC CTGGGACGAC CCGCAGATCG CTGCGCTCAA 660 

CCCCGGCGTG AACCTGCCCG GCACCGCGGT AGTTCCGCTG CACCGCTCCG ACGGGTCCGG 72 0 

TGACACCTTC TTGTTCACCC AGTAC CTGTC CAAGCAAGAT CCCGAGGGCT GGGGCAAGTC 730 

GCCCGGCTTC GGCACCACCG TCGACTTCCC GGCGGTGCCG GGTGCGCTGG GTGAGAACGG 840 

CAACGGCGGC ATGGTGACCG GTTGCGCCGA GACACCGGGC TGCGTGGCCT ATATCGGCAT 900 

CAGCTTCCTC GACCAGGCCA GTCAACGGGG ACTCGGCGAG GCCCAACTAG GCAATAGCTC 960 

TGGCAATTTC TTGTTGCCCG ACGCGCAAAG CATTCAGGCC GCGGCGGCTG GCTTCGCATC 1020 

GAAAACCCCG GCGAACCAGG CGATTTCGAT GATCGACGGG CCCGCCCCGG ACGGCTACCC 10 3 0 

GATCATCAAC TACGAGTACG CCATCGTCAA CAACCGGCAA AAGGACGCCG CCACCGCGCA 114 0 

GACCTTGCAG GCATTTCTGC ACTGGGCGAT CACCGACGGC AACAAGGCCT CGTTCCTCGA 12 00 

CCAGGTTCAT TTCCAGCCGC TGCCGCCCGC GGTGGTGAAG TTGTCTGACG CGTTGATCGC 1260 

GACGATTTCC AGCTAGCCTC GTTGACCACC ACGCGACAGC AACCTCCGTC GGGCCATCGG 1320 

GCTGCTTTGC GGAGCATGCT GGCCCGTGCC GGTGAAGTCG GCCGCGCTGG CCCGGCCATC 13 80 

CGGTGGTTGG GTGGGATAGG TGCGGTGATC CCGCTGCTTG CGCTGGTCTT GGTGCTGGTG 1440 

GTGCTGGTCA TCGAGGCGAT GGGTGCGATC AGGCTCAACG GGTTGCATTT CTTCACCGCC 1500 

ACCGAATGGA ATCCAGGCAA CACCTACGGC GAAACCGTTG TCACCGACGC GTCGCCCATC 1560 

CGGTCGGCGC CTACTACGGG GCGTTGCCGC TGATCGTCGG GACGCTGGCG ACCTCGGCAA 1620 

TCGCCCTGAT CATCGCGGTG CCGGTCTCTG TAGGAGCGGC GCTGGTGATC GTGGAACGGC 168 0 

TGCCGAAACG GTTGGCCGAG GCTGTGGGAA TAGTCCTGGA ATTGCTCGCC GGAATCCCCA 1740 

GCGTGGTCGT CGGTTTGTGG GGGGCAATGA CGTTCGGGCC GTTCATCGCT CATCACATCG 1800 

CTCCGGTGAT CGCTCACAAC GCTCCCGATG TGCCGGTGCT GAACTACTTG CGCGGCGACC 1860 

CGGGCAACGG GGAGGGCATG TTGGTGTCCG GTCTGGTGTT GGCGGTGATG GTCGTTCCCA 192 0 

TTATCGCCAC CACCACTCAT GACCTGTTCC GGCAGGTGCC GGTGTTGCCC CGGGAGGGCG 1980 

CGATCGGGAA TTC l" 3 



(2) INFORMATION FOR SEQ ID NO: 155: 

(i) SEQUENCE CHARACTERISTICS: - 

(A) LENGTH: 374 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION : SEQ ID NO: 155: 
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Met Lys lie Arg Leu His Thr Leu Leu Ala Val Leu Thr Ala Ala Pro 
* 5 10 15 

Leu Leu Leu Ala Ala Ala Gly Cys Gly Ser Lys Pro Pro Ser Gly Ser 
20 25 30 

o ro Glu Thr Gly Ala Gly Ala Gly Thr Val Ala Thr Thr Pro Ala Ser 
35 « 45 

val Thr Leu Ala Glu Thr Gly Ser Thr Leu Leu Tyr Pro Leu 



55 



SO 



Ser Pro 
50 

Phe Asn Leu Trp Gly Pro Ala Phe His Glu Arg Tyr Pro Asa Val Thr 
S5 70 75 

He Thr Ala Gin Gly Thr Gly Ser Gly Ala Gly lie Ala Gin Ala Ala 
35 90 95 



100 105 

Asp Met Ala Ala His Lys 

115 120 



Ala Gly Thr Val Asn lie Gly Ala Ser Asp Ala Tyr Leu Ser Glu Gly 
105 HO 

Gly Leu Met Asn He Ala Leu Ala He Ser 
120 125 

Ala Gin Gin Val Asn Tyr Asn Leu Pro Gly Val Ser Glu His Leu Lys 
130 135 "0 

Leu Asn Gly Lys Val Leu Ala Ala Met Tyr Gin Gly Thr He Lys Thr 
145 150 15S 

Trp Asp Asp Pro Gin He Ala Ala Leu Asn Pro Gly Val Asn Leu Pro 
165 170 175 

Gly Thr Ala Val Val Pro Leu His Arg Ser Asp Gly Ser Gly Asp Thr 

Phe Leu Phe Thr Gin Tyr Leu Ser Lys Gin Asp Pro Glu Gly Trp Gly 
x95 200 205 

Lys Ser Pro Gly Phe Gly Thr Thr Val Asp Phe Pro Ala Val Pro Gly 

-5 1 c 22 0 

210 2 " 

Ala Leu Gly Glu Asn Gly Asn Gly Gly Met Val Thr Gly Cys Ala Glu 
225 230 235 

Thr Pro Gly Cys Val Ala Tyr He Gly He Ser Phe Leu Asp Gin Ala 

245 250 

Ser Gin Arg Gly Leu Gly Glu Ala Gin Leu Gly Asn Ser Ser Gly Asn 

260 265 

Phe Leu Leu Pro Asp Ala Gin Ser He Gin Ala Ala Ala Ala Gly Phe 
275 280 235 

Ala Ser Lys Thr Pro Ala Asn Gin Ala He Ser Met He Asp Gly Pro 

n qc 3 0 0 

290 2yi 



25 



Ala Pro Asp Gly Tyr Pro lie. lie 
305 310 

Asn Arg Gin Lys Asp Ala Ala Thr 
32S 

His Trp Ala He Thr Asp Gly Asn 
340 

Eis Phe Gin Pro Leu Pro Pro Ala 
355 360 

He Ala Thr He Ser Ser 
370 



Asn Tyr Glu Tyr Ala He Val Asn 
315 320 

Ala Gin Thr Leu Gin Ala Phe Leu 
330 335 

Lys Ala Ser Phe Leu Asid Gin Val 
345 350 

Val Val Lys Leu Ser Asp Ala Leu 
365 



DPEP 

(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 999 TaaSe pairs 

(B) TYPE: nucleic acid 

(C) STRANDSDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: 

ATGCATCACC ATCACCATCA CATGCATCAG GTGGACCCCA ACTTGACACG TCGCAAGGGA 60 

CGATTGGCGG CACTGGCTAT CGCGGCGATG GCCAGCGCCA GCCTGGTGAC CGTTGCGGTG 12 0 

CCCGCGACCG CCAACGCCGA TCCGGAGCCA GCGCCCCCGG TACCCACAAC GGCCGCCTCG 18 0 

CCGCCGTCGA CCGCTGCAGC GCCACCCGCA CCGGCGACAC CTGTTGCCCC CCCACCACCG 240 

GCCGCCGCCA ACACGCCGAA TGCCCAGCCG GGCGATCCCA ACGCAGCACC TCCGCCGGCC 3 00 

GACCCGAACG CACCGCCGCC ACCTGTCATT GCCCCAAACG CACCCCAACC TGTCCGGATC 3 60 

GACAACC CGG TTGGAGGATT CAGCTTCGCG CTGCCTGCTG GCTGGGTGGA GTCTGACGCC 420 

GCCCACTTCG ACTACGGTTC AGCACTCCTC AGCAAAACCA CCGGGGACCC GCCATTTCCC 480 

GGACAGCCGC CGCCGGTGGC CAATGACACC CGTATCGTGC TCGGCCGGCT AGACCAAAAG 540 

CTTTACGCCA GCGCCGAAGC CACCGACTCC AAGGCCGCGG CCCGGTTGGG CTCGGACATG 600 

GGTGAGTTCT ATATGCCCTA CCCGGGCACC CGGATCAACC AGGAAACCGT CTCGCTCGAC 660 

GCCAACGGGG TGTCTGGAAG CGCGTCGTAT TACGAAGTCA AGTTCAGCGA TCCGAGTAAG 72 0 

CCGAACGGCC AGATCTGGAC GGGCGTAATC GGCTCGCCCG CGGCGAACGC ACCGGACGCC 730 

GGGCCCCCTC AGCGCTGGTT TGTGGTATGG CTCGGGACCG CCAACAACCC GGTGGACAAG 84 0 

GGCGCGGCCA AGGCGCTGGC CGAATCGATC CGGCCTTTGG TCGCCCCGCC GCCGGCGCCG 900 

GCACCGGCTC CTGCAGAGCC CGCTCCGGCG CCGGCGCCGG CCGGGGAAGT CGCTCCTACC 960 

CCGACGACAC CGACACCGCA GCGGACCTTA CCGGCCTGA 999 



(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 332 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 
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Met His His His 
i 

Arg Arg Lys Gly 
20 

Ala Ser Leu Val 
35 

Glu Pro Ala Pro 
50 

Ala Ala Ala Pro 
65 

Ala Ala Ala Asn 

Pro Pro Pro Ala 
1Q0 

Asn Ala Pro Gin 
115 

Phe Ala Leu Pro 
130 

Tyr Gly Ser Ala 
145 

Gly Gin Pro Pro 

Leu Asp Gin Lys 
180 

Ala Ala Arg Leu 
195 

Gly Thr Arg lie 
210 

Ser Gly Ser Ala 
225 

Pro Asn Gly Gin 

Ala Pro Asp Ala 
260 

Thr Ala Asn Asn 
275 

Ser lie Arg Pro 
290 

Ala Glu Pro Ala 
305 

Pro Thr Thr Pro 



His His His Met 
5 

Arg Leu Ala Ala 

Thr Val Ala Val 
40 

Pro Val Pro Thr 
55 

Pro Ala Pro Ala 
70 

Thr Pro Asn Ala 
85 

Asp Pro Asn Ala 

Pro Val Arg lie 
120 

Ala Gly Trp Val 
135 

Leu Leu Ser Lys 
150 

Pro Val Ala Asn 
165 

Leu Tyr Ala Ser 

Gly Ser Asp Met 
200 

Asn Gin Glu Thr 
215 

Ser Tyr Tyr Glu 
230 

He Trp Thr Gly 
245 

Gly Pro Pro Gin 

Pro Val Asp Lys 
280 

Leu Val Ala Pro 
295 

Pro Ala Pro Ala 
310 

Thr Pro Gin Arg 
325 



His Gin Val Asp 
10 

Leu Ala He Ala 
25 

Pro Ala Thr Ala 

Thr Ala Ala Ser 
60 

Thr Pro Val Ala 
75 

Gin Pro Gly Asp 
90 

Pro Pro Pro Pro 
105 

Asp Asn Pro Val 

Glu Ser Asp Ala 
140 

Thr Thr Gly Asp 
155 

Asp Thr Arg He 

170 

Ala Glu Ala Thr 
135 

Gly Glu Phe Tyr 

Val Ser Leu Asp 
220 

Val Lys Phe Ser 
235 

Val Xle Gly Ser 
250 

Arg Trp Phe Val 
265 

Gly Ala Ala Lys 

Pro Pro Ala Pro 
300 

Pro Ala Gly Glu 
315 

Thr Leu Pro Ala 
330 



Pro Asn Leu Thr 
15 

Ala Met Ala Ser 
30 

Asn Ala Asp Pro 
45 

Pro Pro Ser Thr 

Pro Pro Pro Pro 
30 

Pro Asn Ala Ala 
95 

Val Ha Ala Pro 
110 

Gly Gly Phe Ser 
125 

Ala His Phe Asp 

Pro Pro Phe Pro 
160 

Val Leu Gly Arg 
175 

Asp Ser Lys Ala 
19Q 

Met Pro Tyr Pro 
205 

Ala Asn Gly Val 

Asp Pro Ser Lys 
240 

Pro Ala Ala Asn 
255 

Val Trp Leu Gly 
270 

Ala Leu Ala Glu 
285 

Ala Pro Ala Pro 

Val Ala Pro Thr 
320 



TbH4 

(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 702 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 
CGGCACGAGG ATCGGTACCC CGCGGCATCG GCAGCTGCCG ATTCGCCGGG TTTCCCCACC 
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CGAGGAAAGC CGCTACCAGA TGGCGCTGCC GAAGTAGGGC GATCCGTTCG CGATGCCGGC 12 0 

ATGAACGGGC GGCATCAAAT TAGTGCAGGA ACCTTTCAGT TTAGCGACGA TAATGGCTAT 13 0 

AGCACTAAGG AGGATGATCC GATATGACGC AGTCGCAGAC CGTGACGGTG GATCAGCAAG 2 40 

AGATTTTGAA CAGGGCCAAC GAGGTGGAGG CCCCGATGGC GGACCCACCG ACTGATGTCC 3 00 

CCATCACACC GTGCGAACTC ACGGNGGNTA AAAACGCCGC CCAACAGNTG GTNTTGTCCG 3 50 

CCGACAACAT GCGGGAATAC CTGGCGGCCG GTGCCAAAGA GCGGCAGCGT CTGGCGACCT 420 

CGCTGCGCAA CGCGGCCAAG GNGTATGGCG AGGTTGATGA GGAGGCTGCG ACCGCGCTGG 43 0 

ACAACGACGG CGAAGGAACT GTGCAGGCAG AATCGGCCGG GGCCGTCGGA GGGGACAGTT 54 0 

CGGCCGAACT AACCGATACG CCGAGGGTGG CCACGGCCGG TGAACCCAAC TTCATGGATC 500 

TCAAAGAAGC GGCAAGGAAG CTCGAAACGG GCGACCAAGG CGCATCGCTC GCGCACTGNG 560 

QGGATGGGTG GAACACTTNC ACCCTGACGC TGCAAGGCGA CG 702 



(2) INFORMATION FOR SEQ ID NO: 31: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 285 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 





Asp 


Ser 


Phe 


Trp 


A J. a 


a j. a 


A J. a 


Asp 




dec 


Ala Arg Gly Phe Val 


1 








5 










10 






15 


Leu 


Gly 


Ala 


Thr 


Ala 


Gly 


Arg 


Thr 


Thr 


Leu 


Thr 


Gly Glu Gly Leu Gin 








20 










25 








30 


XXX S 




Asp 




AJL 9 


Ser 


x^eu 


Leu 


Leu 


Asp 


AT a 

-TVJL CI 


Thr Asn 


Pro Ala Val 






35 










40 








45 




Val 


Ala 


Tyr 


Asp 


Pro 


Ala 


Phe 


Ala 


Tyr 


Glu 


He 


Gly Tyr 


He Xaa Glu 




50 










55 










60 




Ser 


Gly 


Leu 


Ala 


Arg 


Met 


Cys 


Gly 


Glu 


Asn 


Pro 


Glu Asn 


He Phe Phe 


65 










70 










75 




80 


Tyr 


He 


Thr 


Val 


Tyr 


Asn 


Glu 


Pro 


Tyr 


Val 


Gin 


Pro Pro 


Glu Pro Glu 










85 










90 






95 


Asn 


Phe 


Asp 


Pro 


Glu 


Gly 


Val 


Leu 


Gly 


Gly 


He 


Tyr Arg 


Tyr His Ala 








100 










105 








110 


Ala 


Thr 


Glu 


Gin 


Arg 


Thr 


Asn 


Lys 


Xaa 


Gin 


He 


Leu Ala 


Ser Gly Val 






115 










120 








125 




Ala 


Met 


Pro 


Ala 


Ala 


Leu 


Arg 


Ala 


Ala 


Gin 


Met 


Leu Ala 


Ala Glu Trp 




130 










135 










140 




Asp 


Val 


Ala 


Ala 


Asp 


Val 


Trp 


■ Ser 


Val 


Thr 


Ser 


Trp Gly Glu Leu Asn 


145 










150 










155 




160 


Arg 


Asp 


Gly 


Val 


Val 


He 


Glu 


Thr 


Glu 


Lys 


Leu 


Arg His 


Pro Asp Arg 










155 










170 






175 


Pro 


Ala 


Gly 


Val 


Pro 


Tyr 


Val 


Thr 


Arg 


Ala 


Leu 


Glu Asn Ala Arg Gly 








180 










185 








190 


Pro 


Val 


He 


Ala 


Val 


Ser 


Asp 


Trp 


Met 


Arg 


Ala 


Val Pro 


Glu Gin He 






195 










200 








205 




Arg 


Pro 


Trp 


Val 


Pro 


Gly 


Thr 


Tyr 


Leu 


Thr 


Leu 


Gly Thr Asp Gly Phe 




210 










215 










220 




Gly 


Phe 


Ser 


Asp 


Thr 


Arg 


Pro 


Ala 


Gly 


Arg 


Arg 


Tyr Phe 


Asn Thr Asp 


225 










230 










235 




240 


Ala 


Glu 


Ser 


Gin 


Val 


Gly 


Arg 


Gly 


Phe 


Gly 


Arg 


Gly Trp 


Pro Gly Arg 










245 










250 






255 


Arg 


Val 


Asn 


He 


Asp 


Pro 


Phe 


Gly 


Ala 


Gly 


Arg 


Gly Pro 


Pro Ala Gin 








250 










255 








270 


Leu 


Pro 


Gly 


Phe 


Asp 


Glu 


Gly 


Gly 


Gly 


Leu 


Arg 


Pro Xaa 


Lys 



28 



275 



230 



235 



MTbRal2 

(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 447 base pairs 

(B) TYPE; nucleic acid 
{C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

CGGTATGAAC ACGGCCGCGT CCGATAACTT CCAGCTGTCC CAGGGTGGGC AGGGATTCGC 60 

CATTCCGATC GGGCAGGCGA TGGCGATCGC GGGCCAGATC CGATCGGGTG GGGGGTCACC 120 

CACCGTTCAT ATCGGGCCTA CCGCCTTCCT CGGCTTGGGT GTTGTCGACA ACAACGGCAA 13 0 

CGGCGCACGA GTCCAACGCG TGGTCGGGAG CGCTCCGGCG GCAAGTCTCG GCATCTCCAC 240 

CGGCGACGTG ATCACCGCGG TCGACGGCGC TCCGATCAAC TCGGCCACCG CGATGGCGGA 3 00 

CGCGCTTAAC GGGCATCATC CCGGTGACGT CATCTCGGTG AACTGGCAAA CCAAGTCGGG 3 60 

CGGCACGCGT ACAGGGAACG TGACATTGGC CGAGGGACCC CCGGC CTGAT TTCGTCGYGG 42 0 

ATACCACCCG CCGGCCGGCC AATTGGA 447 



(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 132 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 



Thr 


Ala 


Ala 


Ser 


Asp 


Asn 


Phe 


Gin 


Leu Ser Gin 


Gly Gly Gin Gly Phe 


1 








5 








10 




15 


Ala 


He 


Pro 


He 


Gly 


Gin 


Ala 


Met 


Ala He Ala 


Gly Gin He Arg Ser 








20 










25 




30 


Gly 


Gly 


Gly 


Ser 


Pro 


Thr 


Val 


His 


He Gly Pro 


Thr 


Ala Phe Leu Gly 






35 










40 






45 


Leu 


Gly 


Val 


Val 


Asp 


Asn 


Asn 


Gly 


Asn Gly Ala 


Arg 


Val Gin Arg Val 




50 










55 






60 




Val 


Gly 


Ser 


Ala 


Pro 


Ala 


Ala 


Ser 


Leu Gly He 


Ser 


Thr Gly Asp Val 


65 










70 






75 




30 


He 


Thr 


Ala 


Val 


Asp 


Gly 


Ala 


Pro 


He Asn Ser 


Ala 


Thr Ala Met Ala 










85 








90 




95 


Asp 


Ala 


Leu 


Asn 


Gly 


His 


His 


Pro 


Gly Asp Val 


He 


Ser Val Asn Trp 








100 










105 




110 


Gin 


Thr 


Lys 


Ser 


Gly 


Gly 


Thr 


Arg 


Thr Gly Asn Val 


Thr Leu Ala Glu 






115 










120 






125 


Gly 


Pro 


Pro 


Ala 

















130 



DPPD 
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(2) INFORMATION FOR SEQ ID NO: 24 0 : 



(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 33 9 base pairs 

(B) TYPE: nucleic ,ac id 

(C) STRAND SDNS SS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:240: 

ATGAAGTTGA AGTTTGCTCG CCTGAGTACT GCGATACTGG GTTGTGCAGC GGCGCTTGTG SO 

TTTCCTGCCT CGGTTGCCAG CGCAGATCCA CCTGACCCGC ATCAGCCGGA CATGACGAAA 12 0 

GGCTATTGCC CGGGTGGCCG ATGGGGTTTT GGCGACTTGG CCGTGTGCGA CGGCGAGAAG 180 

TACCCCGACG GCTCGTTTTG GCACCAGTGG ATGCAAACGT GGTTTACCGG CCCACAGTTT 24 0 

TACTTCGATT GTGTCAGCGG CGGTGAGCCC CTCCCCGGCC CGCCGCCACC GGGTGGTTGC 3 00 

GGTGGGGCAA TTCCGTCCGA GCAGCCCAAC GCTCCCTGA 339 



(2) INFORMATION FOR SEQ ID NO: 241: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 112 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 241: 



Met 


Lys 


Leu Lys 


Phe Ala Arg Leu 


Ser 


Thr 


Ala 


He 


Leu Gly Cys 


Ala 


1 






5 




10 






15 




Ala 


Ala 


Leu Val 


Phe Pro Ala Ser 


Val 


Ala 


Ser 


Ala 


Asp Pro Pro 


Asp 






20 




25 








30 




Pro 


His 


Gin Pro 


Asp Met Thr Lys 


Gly 


Tyr 


Cys 


Pro Gly Gly Arg Trp 






35 


40 










45 




Gly 


Phe" 


Gly Asp 


Leu Ala Val Cys 


Asp 


Gly Glu 


Lys 


Tyr Pro Asp 


Gly 




50 




55 








60 






Ser 


Phe 


Trp His 


Gin Trp Met Gin 


Thr 


Trp 


Phe 


Thr 


Gly Pro Gin 


Phe 


65 






70 






75 






80 


Tyr 


Phe 


Asp Cys 


Val Ser Gly Gly Glu 


Pro 


Leu 


Pro 


Gly Pro Pro 


Pro 








as 




90 






95 




Pro 


Gly 


Gly Cys 


Gly Gly Ala lie 


Pro 


Ser 


Glu 


Gin 


Pro Asn Ala 


Pro 






100 




105 








110 





ESAT-6 

(2) INFORMATION FOR SEQ ID NO: 103; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 154 base pairs 

(B) TYPE: nucleic acid 
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(C) STRAND SDNS SS : single 

(D) TOFOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 103: 

ATGACAGAGC AGCAGTGGAA TTTCGCGGGT ATCGAGGCCG CGGCAAGCGC AATCCAGGGA 
AATGTCACGT CCATTCATTC CCTCCTTGAC GAGGGGAAGC AGTCCCTGAC CAAGCTCGCA 
GCGGCCTGGG GCGGTAGCGG TTCGGAAGCG TACC 



(2) INFORMATION FOR SEQ ID NO: 104: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 amino acids 

(B) TYPE: amino acid 

<C) STRAND ED NESS : single 
(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 104: 

Met Thr Glu Gin Gin Trp Asn Phe Ala Gly lie Glu Ala Ala Ala Ser 

IS 10 is 

Ala lie Gin Gly Asn Val Thr Ser lie His Ser Leu Leu Asp Glu Gly 

20 25 30 

Lys Gin Ser Leu Thr Lys Leu Ala Ala Ala Trp Gly Gly Ser Gly Ser 

35 40 45 

Glu Ala Tyr 
50 
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